Dear all,
I have a grouping problem. I want to create nested sections of a
sequence of elements that starts with a certain element and ends
before certain elements. Apart from this, all elements should appear
in the original order. This is my input file:
<?xml version="1.0" encoding="iso-8859-1"?>
<dict>
<entry id="000152">
<HWD>ace</HWD>
<LV2>I</LV2>
<PSA>n</PSA>
<LV3>1</LV3>
<TSL>ess</TSL>
<IDM>~ of hearts</IDM>
<TSL>hjärteress</TSL>
<IDM>have an ~ up one's sleeve</IDM>
<SEA>bildl.</SEA>
<TSL>ha trumf på hand</TSL>
<LV3>2</LV3>
<IDM>within an ~ of</IDM>
<TSL>ytterst nära</TSL>
<EXA>within an ~ of victory</EXA>
<LV3>3</LV3>
<TSL>ess</TSL>
<LV3>4</LV3>
<SEA>i tennis</SEA>
<TSL>serveess</TSL>
<IDM>She has already hit 13 ~s</IDM>
<TSL>Hon hade redan tagit 13 serveess</TSL>
<LV2>II</LV2>
<PSA>attr adj</PSA>
<IDM>~ reporter</IDM>
<TSL>stjärnreporter</TSL>
</entry>
</dict>
This is the desired output:
<dict>
<entry id="000152">
<HWD>ace</HWD>
<LV2>I</LV2>
<PSA>n</PSA>
<LV3>1</LV3>
<TSL>ess</TSL>
<phrase>
<IDM>~ of hearts</IDM>
<TSL>hjärteress</TSL>
</phrase>
<phrase>
<IDM>have an ~ up one's sleeve</IDM>
<SEA>bildl.</SEA>
<TSL>ha trumf på hand</TSL>
</phrase>
<LV3>2</LV3>
<phrase>
<IDM>within an ~ of</IDM>
<TSL>ytterst nära</TSL>
<EXA>within an ~ of victory</EXA>
</phrase>
<LV3>3</LV3>
<TSL>ess</TSL>
<LV3>4</LV3>
<SEA>i tennis</SEA>
<TSL>serveess</TSL>
<phrase>
<IDM>She has already hit 13 ~s</IDM>
<TSL>Hon hade redan tagit 13 serveess</TSL>
</phrase>
<LV2>II</LV2>
<PSA>attr adj</PSA>
<phrase>
<IDM>~ reporter</IDM>
<TSL>stjärnreporter</TSL>
</phrase>
</entry>
</dict>
So I want to keep the order of all elements but group each <IDM>
element followed by a number of siblings up to another <IDM>, a <LV2>
or a <LV3> element and wrap this sequence in a <phrase> section. This
example is a simplified sample of my data; there are other "stop"
elements as well as other elements that can appear both inside
<phrase> sections and as children of <entry>.
My feeble attempt resulted in the follwing stylesheet:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl
="http://www.w3.org/1999/XSL/Transform"
<xsl:output method="xml" version="1.0" encoding="iso-8859-1" indent
="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="dict">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="entry">
<xsl:copy>
<xsl:copy-of select="HWD"/>
<xsl:choose>
<xsl:when test="*[1][not(self::IDM) ]">
<xsl:for-each select="HWD">
<xsl:apply-templates select
="following-sibling::*[1]" mode="copy"/>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates/>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="copy">
<xsl:copy-of select="."/>
<xsl:apply-templates select="following-sibling::
*[1][not(self::IDM)]" mode="copy"/>
<xsl:if test="following-sibling::*[1][self::IDM]">
<xsl:apply-templates select="following-sibling::
*[1]"/>
</xsl:if>
</xsl:template>
<xsl:template match="IDM">
<phrase>
<xsl:copy-of select="."/>
<xsl:apply-templates select="following-sibling::
*[1][not(self::IDM)]" mode="copy"/>
</phrase>
</xsl:template>
</xsl:stylesheet>
But applied to my input this produces the unwanted output:
<?xml version="1.0" encoding="iso-8859-1"?>
<dict>
<entry>
<HWD>ace</HWD>
<LV2>I</LV2>
<PSA>n</PSA>
<LV3>1</LV3>
<TSL>ess</TSL>
<phrase>
<IDM>~ of hearts</IDM>
<TSL>hjärteress</TSL>
<phrase>
<IDM>have an ~ up one's sleeve</IDM>
<SEA>bildl.</SEA>
<TSL>ha trumf på hand</TSL>
<LV3>2</LV3>
<phrase>
<IDM>within an ~ of</IDM>
<TSL>ytterst nära</TSL>
<EXA>within an ~ of victory</EXA>
<LV3>3</LV3>
<TSL>ess</TSL>
<LV3>4</LV3>
<SEA>i tennis</SEA>
<TSL>serveess</TSL>
<phrase>
<IDM>She has already hit 13 ~s</IDM>
<TSL>Hon hade redan tagit 13 serveess</TSL>
<LV2>II</LV2>
<PSA>attr adj</PSA>
<phrase>
<IDM>~ reporter</IDM>
<TSL>stjärnreporter</TSL>
</phrase>
</phrase>
</phrase>
</phrase>
</phrase>
</entry>
</dict>
I can start a <phrase> section, but I don't know how to express that
it should end. This seems to be a trivial problem, but not to me. I
had a look at the XSLT 2.0 element <xsl:for-each- group>, and I
thought my problems were solved when I saw there is a group-starting-
with and a group-ending-with attribute; I thought they would make it
possible to specify the first tag of a sequence, in this case 'group-
starting-with="IDM"', and the last one, in my input the one
immediately before certain elements, i.e. 'group-ending-with
attribute="following- sibling::*[1][self::IDM] or following-
sibling::*[1][self::LV2] or following-sibling::*[1][self::LV3]"' in
the same <xsl:for-each- group>, but this is apparantly not the way it
works, as the attributes are mutually exclusive.
For any help I'd be most grateful.
Mathias
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--