xsl-list
[Top] [All Lists]

[xsl] XSLT2 node comparison, wordlists

2007-10-24 09:56:54
I'm sure this is easy to do in XSLT2 but I've just not got my head
wrapped around how to compare things properly in an efficient manner.

Let's say I have a wordlist where automatically generated from another
file I've got instances of how each word was used.  In many cases
these are identical in spelling, and what I want to do is merge them
and store links between the original file and the wordlist in a
stand-off markup method.

Say the file has entries for each word which are like:

=====
<entry xml:id="let22-w27">
  <form>
    <orth type="hw">the</orth>
    <form type="orthVar">
      <orth xml:id="w72">The</orth>
      <orth xml:id="w3955">The</orth>
      <orth xml:id="w4513">The</orth>
      <orth xml:id="w4578">The</orth>
      <orth xml:id="w4650">The</orth>
      <orth xml:id="w4672">The</orth>
      <orth xml:id="w4703">The</orth>
      <orth xml:id="w4824">The</orth>
      <orth xml:id="w4830">The</orth>
      <orth xml:id="w2045">the</orth>
      <orth xml:id="w2079">the</orth>
      <orth xml:id="w2101">the</orth>
      <orth xml:id="w2112">the</orth>
      <orth xml:id="w2333">the</orth>
      <orth xml:id="w2400">the</orth>
      <orth xml:id="w2442">the</orth>
      <orth xml:id="w1402">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2422">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w6458">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w7822">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2097">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2155">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2482">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w5887">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w5642">T<ex>h</ex>e</orth>
      <orth xml:id="w5378">t<ex>h</ex>e</orth>
      </form>
  </form>
</entry>
=====
What I want to end up with is for each form[(_at_)type='orthVar'] only
distinct-values for the orth elements therein with new @xml:id values,
and the old ones preserved at the bottom of the file linking new
values with the current ones (which are copies from a different file).
 So something like:

=====
<div>
  <entry xml:id="let22-w27">
    <form>
      <orth type="hw">the</orth>
      <form type="orthVar" n="6"> <!-- n= num of diff variants-->
        <orth xml:id="let22-w27-vA">The</orth>
        <orth xml:id="let22-w27-vB">the</orth>
        <orth xml:id="let22-w27-vC">T<ex>h</ex><hi rend="sup">e</hi></orth>
        <orth xml:id="let22-w27-vD">t<ex>h</ex><hi rend="sup">e</hi></orth>
        <orth xml:id="let22-w27-vE">T<ex>h</ex>e</orth>
        <orth xml:id="let22-w27-vF">t<ex>h</ex>e</orth>
      </form>
    </form>
  </entry>

  <!-- more entries -->

  <!-- at bottom of file -->
  <div type="links">
  <linkGrp xml:id="let22-w27-lg">
  <!-- links between the orth form above with its instance in file.xml -->
    <link targets="#let22-w27-vA  file.xml#w72 file.xml#w3955
      file.xml#w4513 file.xml#w4578 file.xml#w4650 file.xml#w4672
      file.xml#w4703  file.xml#w4824 file.xml#w4830"/>
    <link targets="#let22-w27-vB file.xml#w2045  file.xml#w2079
      file.xml#w2101 file.xml#w2112 file.xml#w2333 file.xml#w2400
      file.xml#w2442"/>
    <link targets="#let22-w27-vC file.xml#w1402 file.xml#w2422
      file.xml#w6458 file.xml#w7822 "/>
    <link targets="#let22-w27-vD file.xml#w2097 file.xml#w2155
      file.xml#w2482 file.xml#w5887"/>
    <link targets="#let22-w27-vE file.xml#w5642"/>
    <link targets="#let22-w27-vF  file.xml#w5378"/>
  </linkGrp>
    <!-- more linkGrps -->
    </div>
</div>
======
XSLT2 is certainly usable in this case, but all of my attempts have
been hideously inefficient, or fail to accurately compare the nested
children properly.

Suggestions?

Thanks,
-James

-- 
James Cummings, Cummings dot James at GMail dot com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>