xsl-list
[Top] [All Lists]

RE: [xsl] Tricky XSLT 2.0 grouping problem

2008-10-10 10:37:09
In general the problem is insoluble, because given the sequence

g - first level
h - first level
1 - second level
2 - second level
A - third level
B - third level
i - first level? or fourth level?

the (i) could be either first level or fourth level.

Your input data structure is badly designed, because it gives no way of
distinguishing these two cases.

A reasonable way to proceed if you're stuck with this input would be to
assume that (i) is first level if and only if the previous first level
number is (h). But to do this you'll need to use sibling recursion rather
than pattern-based grouping.

Michael Kay
http://www.saxonica.com/

 

-----Original Message-----
From: James Sulak [mailto:jsulak(_at_)jonesmcclure(_dot_)com] 
Sent: 10 October 2008 15:04
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Tricky XSLT 2.0 grouping problem

I have a tricky grouping problem that I'm running into a wall 
with.  I thought it might be a fun challenge to throw out 
there.  I'm attempting to group a flat list of <section> 
elements into a hierarchy based on matching its number 
against different regular expressions.  The list is assumed 
to be in the correct order.  I have it working (the code is
below) with one exception:  roman numerals.  

For example, the XML:

<body>
<section><pnum>(a)</pnum><p>First-level section</p></section> 
<section><pnum>(1)</pnum><p>Second-level 
section</p></section> <section><pnum>(A)</pnum><p>Third-level 
section</p></section> 
<section><pnum>(i)</pnum><p>Fourth-level 
section</p></section> 
<section><pnum>(ii)</pnum><p>Fourth-level 
section</p></section> <section><pnum>(B)</pnum><p>Third-level 
section</p></section> 
<section><pnum>(2)</pnum><p>Second-level 
section</p></section> <section><pnum>(A)</pnum><p>Third-level 
section</p></section> </body>

Should give the result:

<body>
<section>
  <pnum>(a)</pnum><p>First-level section</p>
  <section>
    <pnum>(1)</pnum><p>Second-level section</p>
    <section>
      <pnum>(A)</pnum><p>Third-level section</p>
      <section><pnum>(i)</pnum><p>Fourth-level section</p></section>
      <section><pnum>(ii)</pnum><p>Fourth-level section</p></section>
    </section>
    <section>
      <pnum>(B)</pnum><p>Third-level section</p>
    </section>
  </section>
  <section>
    <pnum>(2)</pnum><p>Second-level section</p>
    <section><pnum>(A)</pnum><p>Third-level section</p></section>
  </section>
</section>
</body>

The problem is that the number "(i)," which is supposed to be 
a fourth-level section, in ambiguous with an "(i)" that would 
be a first-level section.  My transform ends up treating it 
like a first-level section, and so gives the following, 
incorrect output:

<body>
<section>
  <pnum>(a)</pnum><p>First-level section</p>
  <section>
    <pnum>(1)</pnum><p>Second-level section</p>
    <section>
      <pnum>(A)</pnum><p>Third-level section</p>
    </section>
  </section>
</section>
<section>
  <pnum>(i)</pnum><p>Fourth-level section</p>
  <section>
    <pnum>(ii)</pnum><p>Fourth-level section</p>
    <section>
      <pnum>(B)</pnum><p>Third-level section</p>
    </section>
  </section>
  <section>
    <pnum>(2)</pnum><p>Second-level section</p>
    <section>
      <pnum>(A)</pnum><p>Third-level section</p>
    </section>
  </section>
</section>
</body>

I've included my current transform below.  The grouping_keys 
variable is a sequence of regex strings that match each 
subsequent level of section
nesting.  Does anybody have an alternate way of tackling this?   

Thanks,

-James 


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema"; version="2.0">

    <xsl:variable name="grouping_keys" as="xs:string+"
                  select="('\([a-z]\)', '\([1-9]\)', 
'\([A-Z]\)', '\([ivx]{1,4}\)')" />
    
    <!-- Start the grouping here -->
    <xsl:template match="codebody">
        <codebody>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="*"
                group-starting-with="section[matches(pnum,
string($grouping_keys[1]))]">
                <xsl:apply-templates select="." mode="group">
                    <xsl:with-param name="level" select="1"
as="xs:integer"/>
                </xsl:apply-templates>
            </xsl:for-each-group>
        </codebody>
    </xsl:template>

    <!-- This template copies the current section and groups 
any "nested" sections  -->
    <xsl:template match="section" mode="group">
        <xsl:param name="level" as="xs:integer"/>
        <section>
            <xsl:copy-of select="@*, *"/>
            <xsl:if test="$level &lt; count($grouping_keys)">
                <xsl:for-each-group select="current-group() except ."
                    group-starting-with="section[matches(pnum,
string($grouping_keys[$level + 1]))]">
                    <xsl:apply-templates select="." mode="group">
                        <xsl:with-param name="level" 
select="$level + 1"
as="xs:integer"/>
                    </xsl:apply-templates>
                </xsl:for-each-group>
            </xsl:if>
        </section>
    </xsl:template>

    <xsl:template match="element()" mode="#all">
        <xsl:copy>
            <xsl:apply-templates select="@*,node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template
match="attribute()|text()|comment()|processing-instruction()"
mode="#all">
        <xsl:copy/>
    </xsl:template>


</xsl:stylesheet>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>