xsl-list
[Top] [All Lists]

Re: [xsl] Converting milestone tags

2010-10-14 04:16:06
good case for grouping

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl";
    exclude-result-prefixes="xs xd"
    version="2.0">
    <xd:doc scope="stylesheet">
        <xd:desc>
            <xd:p><xd:b>Created on:</xd:b> Oct 14, 2010</xd:p>
            <xd:p><xd:b>Author:</xd:b> vsedov</xd:p>
            <xd:p></xd:p>
        </xd:desc>
    </xd:doc>
    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[span]">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="node()"
                group-by="count(self::span[(_at_)order eq 'start']) +
count(preceding-sibling::span[(_at_)order = ('start', 'end')])">
            <xsl:choose>
                <xsl:when test="current-group()/self::span">
                    <span><xsl:apply-templates select="current-group()"/></span>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each-group></xsl:copy>
    </xsl:template>
    <xsl:template match="span[(_at_)order = ('start', 'end')]"/>
</xsl:stylesheet>

Vyacheslav Sedov
Schematronic

2010/10/14 Michael Kay <mike(_at_)saxonica(_dot_)com>

 This class of problems is quite tricky. The most general approach is to 
flatten the first hierarchy, so everything is reduced to milestones, and then 
use positional grouping to construct the new hierarchy from the flat 
structure.

If you have access to a good library, try looking for Michael Jackson's 1970s 
books on Jackson Structured Programming, where he tackles this class of 
problem under the heading of "boundary conflict". The vocabulary is different 
- it's all about sequential processing of hierarchic files on magnetic tape - 
but the logic is the same, and it's the most systematic treatment I've seen. 
Essentially he shows that if the hierarchic structure of the input and output 
are in some sense congruent, then a single tree walk over the input can 
handle the problem, but if they aren't then you can devise a new intermediate 
hierarchy - perhaps very flat - that is congruent with both the input and the 
output, so one tree walk will get you from the input to the intermediate 
tree, and a second tree walk will get you from the intermediate tree to the 
output. (This is assuming of course that you don't have an ordering conflict, 
which is true in your case).

Your example doesn't need the full generality of this approach, because the 
start/end milestones are always siblings and are always matched in the same 
paragraph, but your discussion indicates that you might want to tackle things 
that go beyond this example.

Michael Kay
Saxonica

On 14/10/2010 8:05 AM, Josef Schneeberger wrote:

Hi everybody,

I am new to this list and apologize, if my question is an FAQ. I scanned
the archives, but did not find a solution. The question arises in a TEI
project where we have to switch from a chapter hierarchy to a page
oriented form. The XSLT is done in multiple steps (a cocoon pipeline)
and I use Saxon9.

Here is a simplified example of an infile:

<root>
 <p>text<span order="start"/>text<span order="end"/>  text</p>
 <p>text<span order="start"/>text<span order="end"/>  text
    text<span order="start"/>text<span order="end"/>  text</p>
 <p>text text text<span order="start"/>text<span order="end"/></p>
 <p><span order="start"/>text<span order="end"/>  text text text</p>
</root>

which should result in the following output:

<root>
 <p>text<span>text</span>  text</p>
 <p>text<span>text</span>  text
    text<span>text</span>  text</p>
 <p>text text text<span>text</span></p>
 <p><span>text</span>  text text text</p>
</root>

There my be an arbitrary number of<span order="begin"/>  (and
corresponding end milestone tags) in a p element. Furthermore, any
"text" node may again contain markup which should be preserved in the
output. I tried various approaches but I failed. Here is one of my
attempts using sibling recursion ...

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
 <xsl:template match="/">
  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="root">
  <root><xsl:apply-templates/></root>
 </xsl:template>

 <xsl:template match="p">
  <p>
   <xsl:apply-templates select="child::node()" mode="procp"/>
  </p>
 </xsl:template>

 <xsl:template match="span[(_at_)order='start']" mode="procp">
  <span>
   <xsl:apply-templates
     select="following-sibling::node()[1][not(self::span)]"
     mode="procp"/>
  </span>
  <xsl:apply-templates select="following-sibling::node()[1]"/>
 </xsl:template>

 <xsl:template match="node()" mode="procp">
  <xsl:copy-of select="."/>
   <xsl:apply-templates
      select="following-sibling::node()[1][not(self::span)]"
      mode="procp"/>
 </xsl:template>
</xsl:stylesheet>

Any help would be greatly appreciated. Josef



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>