xsl-list
[Top] [All Lists]

RE: [XSL] extracting a verse (LONG)

2002-12-19 08:55:23
At the risk of beating a dead horse, here's an improvement of the key-based
method to handle markup that doesn't cleanly nest.  Because of the way it
assigns keys, it does require id attributes to already exist on the verse
and verseEnd elements, so I had to rewrite Wendell's example a bit - but
those id's could obviously be generated by a pre-processing step.  I
couldn't use generate-id because I want to assign multiple keys to each
node.

Basically the approach is to have 2 keys, "verses" and "verseends", on all
nodes except for the root element.  For any node, the verses key will
contain the id attributes of all <verse/> milestones preceding the node, or
contained in the node.  For any node, the verseends key will contain the id
attributes of all <verseEnd/> milestones following the node, or contained in
the node.  Then, for each verse, you do a key operation on the id of the
verse and verseEnd, giving you two nodesets, and take the intersection of
them to find nodes within the verse, or that contain the verse either fully
or in a non-well-formed way.  Then apply templates to nodes in the
intersection that don't have a parent in the intersection (to avoid
repetition), and carry the intersection nodeset as a parameter so that
through-out all the apply-templates you do for a given parent, only child
nodes within the intersection are processed.  Anyway, that's what I think is
going on, and though not properly tested, it seems to work for the two
examples I have:

verses5.xslt:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:key name="verses" match="text() | *[parent::*]"
use="preceding::verse/@id | .//verse/@id"/>
<xsl:key name="verseends" match="text() | *[parent::*]"
use="following::verseEnd/@id | .//verseEnd/@id"/>

<xsl:template match="/">
  <quote>
    <xsl:for-each select="//verse">
        <verse>
      <xsl:variable name="starts" select="key('verses',@id)"/>
      <xsl:variable name="ends" select="key('verseends',@to)"/>
      <xsl:variable name="text" select="$starts[count(.|$ends) =
count($ends)]"/>
              <xsl:apply-templates select="$text[not(count(parent::*|$text)
= count($text))]">
              <xsl:with-param name="text" select="$text"/>
             </xsl:apply-templates>
      </verse>
    </xsl:for-each>
  </quote>
</xsl:template>

  
<xsl:template  match="*">
  <xsl:param name="text"/>
  <xsl:element name="{name(.)}">
    <xsl:copy-of select="@*"/>
    <xsl:attribute name="origElementID">
      <xsl:value-of select="generate-id()"/>    
    </xsl:attribute>
    <xsl:apply-templates select="*[count(.|$text) = count($text)] |
text()[count(.|$text) = count($text)]">
      <xsl:with-param name="text" select="$text"/>
    </xsl:apply-templates>
  </xsl:element>
</xsl:template>
  
</xsl:stylesheet>


verses.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="C:\Work\xsl\verses5.xslt"?>
<text>
        <div>
                <chapter id="BCV-GEN-1" to="BCV-GEN-1-END" value="1"/>
                <head>The Story of#Creation</head>
                <p>
                        <verse id="BCV-GEN-1.1" to="BCV-GEN-1.1-END"
value="1"/>In the
beginning, when God created the universe,
      <verseEnd id="BCV-GEN-1.1-END" from="BCV-GEN-1.1"/>
                        <verse id="BCV-GEN-1.2" to="BCV-GEN-1.2-END"
value="2"/>the
earth was formless and desolate. The raging ocean that covered everything
was engulfed in total darkness, and the ......         
    </p>
                <p>rest of verse 2 
      <verseEnd id="BCV-GEN-1.2-END" from="BCV-GEN-1.2"/>
        but this is just paragraph
    </p>
                <p>Paragraph Paragraph Paragraph 
      <verse id="BCV-GEN-1.3" to="BCV-GEN-1.3-END" value="3"/>This is the
third
    </p>
                <p>verse  </p>
                <verseEnd id="BCV-GEN-1.3-END" from="BCV-GEN-1.3"/>
                <p> paragraph </p>
        </div>
</text>

Output:

<?xml version="1.0" encoding="UTF-8"?>
<quote>
  <verse>
    <div origElementID="IDABELQB">
      <p origElementID="IDAHELQB">In the
beginning, when God created the universe,
      </p>
    </div>
  </verse>
  <verse>
    <div origElementID="IDABELQB">
      <p origElementID="IDAHELQB">the
earth was formless and desolate. The raging ocean that covered everything
was engulfed in total darkness, and the ......         
    </p>
      <p origElementID="IDAVELQB">rest of verse 2 
      </p>
    </div>
  </verse>
  <verse>
    <div origElementID="IDABELQB">
      <p origElementID="IDA0ELQB">This is the third
    </p>
      <p origElementID="IDAAFLQB">verse  </p>
    </div>
  </verse>
</quote>

Which looks right - the origElementID attributes that I'm adding make it
obvious how the elements have been split between verses.

This is a modified version of one of Wendell's files:
verses5.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="C:\Work\xsl\verses5.xslt"?>
<quote>
<verse id="1" to="e1"/>No! penury, inertness and grimace,<verseEnd id="e1"/>
<verse id="2" to="e2"/>In some strange sort, were the land's portion.
<q>See<verseEnd id="e2"/>
<verse id="3" to="e3"/>Or shut your eyes,</q> said Nature
peevishly,<verseEnd id="e3"/>
<verse id="4" to="e4"/><q>It nothing skills: I cannot help my case:<verseEnd
id="e4"/>
<verse id="5" to="e5"/>'Tis the Last Judgment's fire must cure this
place,<verseEnd id="e5"/>
<verse id="6" to="e6"/>Calcine its clods and set my prisoners
free.</q><verseEnd id="e6"/>
</quote>

Output:

<?xml version="1.0" encoding="UTF-8"?>
<quote>
  <verse>No! penury, inertness and grimace,</verse>
  <verse>In some strange sort, were the land's portion. <q
origElementID="IDANPKQB">See</q>
  </verse>
  <verse>
    <q origElementID="IDANPKQB">Or shut your eyes,</q> said Nature
peevishly,</verse>
  <verse>
    <q origElementID="IDAZPKQB">It nothing skills: I cannot help my
case:</q>
  </verse>
  <verse>
    <q origElementID="IDAZPKQB">'Tis the Last Judgment's fire must cure this
place,</q>
  </verse>
  <verse>
    <q origElementID="IDAZPKQB">Calcine its clods and set my prisoners
free.</q>
  </verse>
</quote>

I'm not sure how processor intensive this is, but it seems to more or less
do what's needed.

Thanks,
David.
--
David McNally            Moody's Investors Service
Software Engineer        99 Church St, NY NY 10007 
David(_dot_)McNally(_at_)Moodys(_dot_)com            (212) 553-7475 


---------------------------------------

The information contained in this e-mail message, and any attachment thereto, 
is confidential and may not be disclosed without our express permission.  If 
you are not the intended recipient or an employee or agent responsible for 
delivering this message to the intended recipient, you are hereby notified that 
you have received this message in error and that any review, dissemination, 
distribution or copying of this message, or any attachment thereto, in whole or 
in part, is strictly prohibited.  If you have received this message in error, 
please immediately notify us by telephone, fax or e-mail and delete the message 
and all of its attachments.  Thank you.

Every effort is made to keep our network free from viruses.  You should, 
however, review this e-mail message, as well as any attachment thereto, for 
viruses.  We take no responsibility and have no liability for any computer 
virus which may be transferred via this e-mail message.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>