This can be considered a grouping problem (as David Carlisle wrote),
where each row corresponds to a group (and all text nodes and <a> nodes
within a row belong to the same group), but it's tricky because a single
text node with newlines splits across two (or more!) groups.
Thus my approach to the problem is a two-phase approach:
1) split text nodes with newlines into multiple text nodes
2) do the grouping (splitting groups where the line breaks are)
For example, consider the input file:
This is <a href="foo">hello</a> the first line
and this <a href="foo">hello</a> is the second line.
The nodes are:
1. This is
2. <a>
3. the first line \n and this
4. <a>
5. is the second line.
#3 is the tricky one. The first transform should convert the above
node-set into the following node-set:
1. this is
2. <a ...>
3. the first line
4. <line-break/>
5. and this
6. <a ..>
7. is the second line.
Then, you just group by the <line-break/> node. In XSL version 2 you do:
<xsl:for-each-group select="bodytext/node()"
group-ending-with="line-break">
<div>
<xsl:apply-templates select="current-group()[not(self::line-break)]"/>
</div>
</xsl:for-each-group>
For those of us forced to use MSXML (which I assume doesn't support XSL
version 2), in place of the for-each-group you would have to do the
Muenchian grouping as described on jenny's site
(http://www.jenitennison.com/xslt/grouping/muenchian.html) and in
David's mail. This is pretty confusing stuff, but the basic idea is:
1) the nodes which "start" each group (a.k.a. row) have an id number
(generated by generate-id). <bodytext> starts the first row, and
<line-break/> starts each subsequent row.
2) Every node that belongs to a group gets the same "key". Specifically,
the key of every node within a certain group is equal to the id of the
node that starts that group
So the keys would look like this:
<bodytext> (**id=1)
This is (key=1)
<a> (key=1)
the first line (key=1)
<line-break/> (**id=2)
and this (key=2)
<a> (key=2)
is the second line. (key=2)
The tricky XSL to generate these keys is something like this:
<xsl:key name="x" match="bodytext/node()"
use="generate-id((..|preceding-sibling::line-break)[last()])"/>
In other words, make a list like this: (my-parent-node,
line-break-nodes-before-me), and then take the last element in that list. This
gives the previous line-break node, or the bodytext node if there is no
previous line-break node.
-----------------------------------------------------------------------
Here is the code to split text with newlines into multiple nodes. It's
recursive in case a single text node has two (or more) embedded
newlines. I ended up encoding each text segment within <myText> tags,
because if you use an intermediate file to save the results this helps
keep track of the divisions between text nodes.
<xsl:template match="text()">
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="."/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
<xsl:template name="split-text">
<xsl:param name="arg1"/>
<xsl:choose>
<xsl:when test="contains($arg1,' ')">
<myText><xsl:value-of select="substring-before($arg1,' ')"/></myText>
<xsl:call-template name="split-text">
<xsl:with-param name="arg1">
<xsl:value-of select="substring-after($arg1,' ')"/>
</xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<myText><xsl:value-of select="$arg1"/></myText>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
------------------------------------------------
Bill
David Carlisle wrote:
I wrote:
it's a grouping problem (positional grouping in Mike's terminoligy)
you want to group all child nodes before or after text nodes containing
cr ie text()[contains(.,' ')] you need to work at the level of nodes
not of the entire content of your bodytext element.
See Jeni's site on grouping techniques.
David
I suppose this is probably more helpful...
div.xml
========
<page>
<bodytext>This is the <link url="zzz">link</link>
This is another line</bodytext>
</page>
div.xsl
========
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:key name="x" match="bodytext/node()"
use="generate-id((..|preceding-sibling::text()[contains(.,' ')][1])[last()])"/>
<xsl:template match="page">
<html>
<head>
<title>testing...</title>
</head>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="link">
<a href="{(_at_)url}">
<xsl:apply-templates/>
</a>
</xsl:template>
<xsl:template match="bodytext">
<body>
<xsl:for-each select=".|text()[contains(.,' ')]">
<div>
<xsl:value-of
select="substring-after(self::text(),' ')"/>
<xsl:apply-templates
select="key('x',generate-id(.))[position()<last()]"/>
</div>
<xsl:value-of
select="substring-before(key('x',generate-id(.))[last()],' ')"/>
</xsl:for-each>
</body>
</xsl:template>
</xsl:stylesheet>
$ saxon div.xml div.xsl
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
<title>testing...</title>
</head>
<body>
<div>This is the <a href="zzz">link</a></div>
<div> This is another line</div>
</body>
</html>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list