xsl-list
[Top] [All Lists]

RE: Problems with mixed content and inline elements when transforming XHTML into another XML format

2006-02-22 16:40:55
You're using XSLT 2.0 so this can be solved using grouping constructs.

Forget the templates that create <textnode> elements.

You want something like this, which causes adjacent "inline" nodes to be
grouped under a new element, with a function to decide whether a node is an
"inline" node:

<xsl:template match="div">
  <xsl:copy>
    <xsl:for-each-group select="node()"
group-adjacent="f:is-inline(node())">
      <xsl:choose>
        <xsl:when test="current-grouping-key()">
          <textnode><xsl:copy-of select="current-group()"/></textnode>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="current-group()"/>
        </
      </
    </
  </
</

<xsl:function name="f:is-inline" as="xs:boolean">
  <xsl:param name="node" as="node()"/>
  <xsl:sequence select="$node instanceof text() or
$node[self::u|self::b|self::i]"/>
</xsl:function>

Michael Kay
http://www.saxonica.com/
   

-----Original Message-----
From: Tony Kinnis [mailto:kinnist(_at_)yahoo(_dot_)com] 
Sent: 22 February 2006 22:29
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Problems with mixed content and inline 
elements when transforming XHTML into another XML format

Hello all,

I have been trying to solve this problem for a few days now and I have
had no luck. I am hoping someone here can help me out with this.

I need to parse XHTML and transform it into another XML format. I am
sure that the XHTML is valid and well formed (I am running it through
HTMLTidy). The first problem I encountered was the notion of mixed
elements. Something like...

<div>
     My name is <b>bob</>. What is yours?
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

I found a utility script on the web that can turn mixed content into
element content. I am guessing some of you have seen this script
before.

<xsl:template match="text()[normalize-space(.)][../*]">        
        <xsl:element name="textnode">
            <xsl:value-of select="."/>
        </xsl:element>
    </xsl:template>
    
    <xsl:template match="@*|node()">   
        <xsl:copy>            
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

This makes the above post look like...

<div>
     <textnode>My name is </textnode><b>bob</><textnode>. What is
yours?</textnode>
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

However, what I would really like to do is have the bold tags included
inside of the textnode tag so that it looks like...

<div>
     <textnode>My name is <b>bob</>. What is yours?</textnode>
    <ul>
         <li>foo</li>
         <li>bar</li>
    </ul>
</div>

In other words I would like to treat the <b> element as text 
and not an
element. There is a finite set of tags I would like to be treated as
simple text. These are considered in-line elements in html.
<b><i><em><strong><u>

An alternative, and better solution, would be wrapping all 
text through
the document in the textnode element including the in-line elmements
mentioned above. The  xml I will finally output from the 
transformation
of the xhtml requires all text be wrapped in a special displaytext tag
including the in-line elements mentioned above. By placing every piece
of text, including the in-line text tags above, in a textnode I could
easily pass the document through another template that says...

   <xsl:template match="textnode[normalize-space(.)]">
        <xsl:element name="displaytext">
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template> 

This would make things much easier.

Below are the xsl processor and xsl version. I am not tied to Saxon if
another processor could do the job, provided it can be used 
within Java
and ports across platforms (windows, unix, etc).

Processor: Saxon8B
XSL Version: 2.0

Thanks in advance for your help.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--