xsl-list
[Top] [All Lists]

Re: [xsl] Finding an untagged ordered list

2009-01-14 06:52:07
I love to play with Kernow's XSLT Playground feature and your case way well prepared for that.

With the help of xsl:for-each-group and regular expressions your case can be solved. I did not try to exclude the possibly last P from the OL and I did not check for the alphabetical order. You would have to specify more clearly what should happen, but look at this:

<xsl:template match="root">
  <xsl:copy>
    <xsl:for-each-group select="*"
      group-adjacent="if (self::P and
      (self::P[matches(., '^[A-Z][.]')] or
      preceding-sibling::P[matches(., '^[A-Z][.]')]))
      then 0 else position()">
      <xsl:choose>
        <xsl:when test="current-grouping-key() = 0">
          <OL>
            <xsl:for-each-group select="current-group()"
              group-starting-with="P[matches(., '^[A-Z][.]')]">
              <xsl:choose>
                <xsl:when test="./self::P[matches(., '^[A-Z][.]')]">
                  <LI>
<xsl:apply-templates select="current-group()" mode="join"/>
                  </LI>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:apply-templates select="current-group()"/>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:for-each-group>
          </OL>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="."/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

<xsl:template match="node()" mode="join">
  <xsl:apply-templates select="@*|node()"/>
  <xsl:value-of select="' '"/>
</xsl:template>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>


It creates

<root>
   <H1>Heading</H1>
   <P>List Heading</P>
   <OL>
      <LI>A. One big sentence incorrectly placed in P tags </LI>
      <LI>B. Another long sentence spanning P tags. </LI>
<LI>C. This should really be one long list item that spans randomly. Hopefully I am some unrelated text. </LI>
   </OL>
</root>

Good luck!

- Michael



Am 13.01.2009 um 23:45 schrieb Graeme Kidd:

Hi everyone,
I am using XSLT 2.0 and I have three questions about an ordered list in this format:
<root>
  <H1>Heading</H1>
  <P>List Heading</P>
  <P>A. One big</P>
  <P>sentence incorrectly</P>
  <P>placed in P tags</P>
  <P>B. Another long</P>
  <P>sentence spanning P tags.</P>
  <P>C. This should really</P>
  <P>be one</P>
  <P>long</P>
  <P>list item</P>
  <P>that spans randomly.</P>
  <P>Hopefully I am some unrelated text.</P>
</root>

Which I want converted to this format:
<root>
  <H1>Heading</H1>
  <P>List Heading</P>
  <OL>
      <LI>One big sentence incorrectly placed in P tags.</LI>
      <LI>Another long sentence spanning P tags.</LI>
<LI>This should really be one long list item that spans randomly.</LI>
  </OL>
  <P>Hopefully I am some unrelated text.</P>
</root>

Due to the original XML file being rather bad the list may not start at A. Before I have been able to catch a numbered list just by checking if the P tag starts with a number and its preceding sibling does not, then when I am inside the list check if the next P tag starts with a number a well. This list is different though.

1) I imagine you can check if the P tags starts with a letter by doing something like this: P[starts-with(translate(., 'vUppercaseChars_CONST', 'vUppercaseAChar_CONST'), 'A')]
How would you then check it begins with letter followed by a dot?

2) Is it possible to find a letter followed by a dot then check if the next P node starts with the next letter of the alphabet followed by a dot?

3) Is it possible to check if the next 10 P tags contain the next letter of the alphabet plus a dot. Previously I have been able to pick up lists no problem when they had a predictable pattern but this one doesn't. I can only assume that the list ends after about 10 P tags or it finds a character in a previous position in the alphabet or it hits some other tag that is not a P tag. I would end the list item at the first full stop it found after the last P tag that started with character plus a dot. Is something like this possible in XSLT and if so how?

Thanks for your time,
Graeme



--
_______________________________________________________________
Michael Müller-Hillebrand: Dokumentation Technology
Adobe Certified Expert, FrameMaker
Consulting and Training, FrameScript, XML/XSL, Unicode
Blog [de]: http://cap-studio.de/




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>