Re: Passing element nodes through string functions (WAS RE: Preserving

This can be considered a grouping problem (as David Carlisle wrote),
where each row corresponds to a group (and all text nodes and <a> nodes
within a row belong to the same group), but it's tricky because a single
text node with newlines splits across two (or more!) groups.

Thus my approach to the problem is a two-phase approach:
1) split text nodes with newlines into multiple text nodes
2) do the grouping (splitting groups where the line breaks are)

For example, consider the input file:

This is <a href="foo">hello</a> the first line
and this <a href="foo">hello</a> is the second line.

The nodes are:
1. This is
2. <a>
3. the first line \n and this
4. <a>
5. is the second line.

#3 is the tricky one. The first transform should convert the above
node-set into the following node-set:

1. this is
2. <a ...>
3. the first line
4. <line-break/>
5. and this
6. <a ..>
7. is the second line.

Then, you just group by the <line-break/> node. In XSL version 2 you do:

<xsl:for-each-group select="bodytext/node()"
group-ending-with="line-break">
<div>
<xsl:apply-templates select="current-group()[not(self::line-break)]"/>
</div>
</xsl:for-each-group>

For those of us forced to use MSXML (which I assume doesn't support XSL
version 2), in place of the for-each-group you would have to do the
Muenchian grouping as described on jenny's site
(http://www.jenitennison.com/xslt/grouping/muenchian.html) and in
David's mail. This is pretty confusing stuff, but the basic idea is:

1) the nodes which "start" each group (a.k.a. row) have an id number
(generated by generate-id). <bodytext> starts the first row, and
<line-break/> starts each subsequent row.

2) Every node that belongs to a group gets the same "key". Specifically,
the key of every node within a certain group is equal to the id of the
node that starts that group

So the keys would look like this:

<bodytext> (**id=1)
   This is (key=1)
   <a> (key=1)
   the first line (key=1)
   <line-break/> (**id=2)
   and this (key=2)
   <a> (key=2)
   is the second line. (key=2)

The tricky XSL to generate these keys is something like this:

<xsl:key name="x" match="bodytext/node()"
use="generate-id((..|preceding-sibling::line-break)[last()])"/>

In other words, make a list like this: (my-parent-node, 
line-break-nodes-before-me), and then take the last element in that list.  This 
gives the previous line-break node,  or the bodytext node if there is no 
previous line-break node.

-----------------------------------------------------------------------

Here is the code to split text with newlines into multiple nodes. It's
recursive in case a single text node has two (or more) embedded
newlines. I ended up encoding each text segment within <myText> tags,
because if you use an intermediate file to save the results this helps
keep track of the divisions between text nodes.

<xsl:template match="text()">
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="."/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>

<xsl:template name="split-text">
<xsl:param name="arg1"/>
<xsl:choose>
<xsl:when test="contains($arg1,'&#10;')">
<myText><xsl:value-of select="substring-before($arg1,'&#10;')"/></myText>
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="substring-after($arg1,'&#10;')"/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<myText><xsl:value-of select="$arg1"/></myText>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

------------------------------------------------

Bill


David Carlisle wrote:

I wrote:

 it's a grouping problem (positional grouping in Mike's terminoligy)
 you want to group all child nodes before or after text nodes containing
 cr ie text()[contains(.,'&#10;')] you need to work at the level of nodes
 not of the entire content of your bodytext element.

 See Jeni's site on grouping techniques.

 David



I suppose this is probably more helpful...

div.xml
========

<page>
   <bodytext>This is the <link url="zzz">link</link>
   This is another line</bodytext>
</page>



div.xsl
========

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="1.0">

<xsl:key name="x" match="bodytext/node()"
use="generate-id((..|preceding-sibling::text()[contains(.,'&#10;')][1])[last()])"/>

<xsl:template match="page">
<html>
<head>
<title>testing...</title>
</head>
<xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="link">
<a href="{(_at_)url}">
<xsl:apply-templates/>
</a>
</xsl:template>


<xsl:template match="bodytext">
<body>
<xsl:for-each select=".|text()[contains(.,'&#10;')]">
<div>
<xsl:value-of 
 select="substring-after(self::text(),'&#10;')"/>
<xsl:apply-templates
select="key('x',generate-id(.))[position()&lt;last()]"/>
</div>
<xsl:value-of
select="substring-before(key('x',generate-id(.))[last()],'&#10;')"/>
</xsl:for-each>
</body>
</xsl:template>

</xsl:stylesheet>






$ saxon div.xml div.xsl


<html>
  <head>
     <meta http-equiv="Content-Type" content="text/html;
charset=utf-8">

     <title>testing...</title>
  </head>

  <body>
     <div>This is the <a href="zzz">link</a></div>
     <div>    This is another line</div>
  </body>

</html>


XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Re: Passing element nodes through string functions (WAS RE: Preserving inline elements when using string functions)