xsl-list
[Top] [All Lists]

RE: [xsl] XSLT2 node comparison, wordlists

2007-10-24 10:34:52

Nice to see you back, James.

It's not easy, actually! It's in effect value-based grouping where the
equality function is deep-equals() rather than the eq operator. I think I
would do this by something along the lines of

<xsl:for-each-group group-by="saxon:serialize(.)">

except that doesn't quite work because you want to ignore the xml:id.

If the markup is only one level deep you could to the same thing by hand,
along the lines

<xsl:for-each-group group-by="my:serialize(.)">


<xsl:function name="my:serialize">
  <xsl:param name="in" as="element(orth)"/>
  <xsl:apply-templates select="$in/child::node()" mode="grouping-key"/>
</xsl:function>

<xsl:template match="text()" mode="grouping-key">
  <xsl:value-of select="."/>
</xsl:template>

<xsl:template match="*" mode="grouping-key">
  <xsl:text>&lt;</xsl:text>
    <xsl:value-of select="name()"/>
    <xsl:for-each select="@*">
      <xsl:text> </xsl:text>
...etc

Of course the grouping key doesn't actually need to be an XML serialization,
it can have any syntax you fancy so long as it distinguishes distinct
values.

Michael Kay
http://www.saxonica.com/
 

-----Original Message-----
From: James Cummings [mailto:cummings(_dot_)james(_at_)gmail(_dot_)com] 
Sent: 24 October 2007 17:56
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XSLT2 node comparison, wordlists

I'm sure this is easy to do in XSLT2 but I've just not got my 
head wrapped around how to compare things properly in an 
efficient manner.

Let's say I have a wordlist where automatically generated 
from another file I've got instances of how each word was 
used.  In many cases these are identical in spelling, and 
what I want to do is merge them and store links between the 
original file and the wordlist in a stand-off markup method.

Say the file has entries for each word which are like:

=====
<entry xml:id="let22-w27">
  <form>
    <orth type="hw">the</orth>
    <form type="orthVar">
      <orth xml:id="w72">The</orth>
      <orth xml:id="w3955">The</orth>
      <orth xml:id="w4513">The</orth>
      <orth xml:id="w4578">The</orth>
      <orth xml:id="w4650">The</orth>
      <orth xml:id="w4672">The</orth>
      <orth xml:id="w4703">The</orth>
      <orth xml:id="w4824">The</orth>
      <orth xml:id="w4830">The</orth>
      <orth xml:id="w2045">the</orth>
      <orth xml:id="w2079">the</orth>
      <orth xml:id="w2101">the</orth>
      <orth xml:id="w2112">the</orth>
      <orth xml:id="w2333">the</orth>
      <orth xml:id="w2400">the</orth>
      <orth xml:id="w2442">the</orth>
      <orth xml:id="w1402">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2422">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w6458">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w7822">T<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2097">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2155">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w2482">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w5887">t<ex>h</ex><hi rend="sup">e</hi></orth>
      <orth xml:id="w5642">T<ex>h</ex>e</orth>
      <orth xml:id="w5378">t<ex>h</ex>e</orth>
      </form>
  </form>
</entry>
=====
What I want to end up with is for each form[(_at_)type='orthVar'] 
only distinct-values for the orth elements therein with new 
@xml:id values, and the old ones preserved at the bottom of 
the file linking new values with the current ones (which are 
copies from a different file).
 So something like:

=====
<div>
  <entry xml:id="let22-w27">
    <form>
      <orth type="hw">the</orth>
      <form type="orthVar" n="6"> <!-- n= num of diff variants-->
        <orth xml:id="let22-w27-vA">The</orth>
        <orth xml:id="let22-w27-vB">the</orth>
        <orth xml:id="let22-w27-vC">T<ex>h</ex><hi 
rend="sup">e</hi></orth>
        <orth xml:id="let22-w27-vD">t<ex>h</ex><hi 
rend="sup">e</hi></orth>
        <orth xml:id="let22-w27-vE">T<ex>h</ex>e</orth>
        <orth xml:id="let22-w27-vF">t<ex>h</ex>e</orth>
      </form>
    </form>
  </entry>

  <!-- more entries -->

  <!-- at bottom of file -->
  <div type="links">
  <linkGrp xml:id="let22-w27-lg">
  <!-- links between the orth form above with its instance in 
file.xml -->
    <link targets="#let22-w27-vA  file.xml#w72 file.xml#w3955
      file.xml#w4513 file.xml#w4578 file.xml#w4650 file.xml#w4672
      file.xml#w4703  file.xml#w4824 file.xml#w4830"/>
    <link targets="#let22-w27-vB file.xml#w2045  file.xml#w2079
      file.xml#w2101 file.xml#w2112 file.xml#w2333 file.xml#w2400
      file.xml#w2442"/>
    <link targets="#let22-w27-vC file.xml#w1402 file.xml#w2422
      file.xml#w6458 file.xml#w7822 "/>
    <link targets="#let22-w27-vD file.xml#w2097 file.xml#w2155
      file.xml#w2482 file.xml#w5887"/>
    <link targets="#let22-w27-vE file.xml#w5642"/>
    <link targets="#let22-w27-vF  file.xml#w5378"/>
  </linkGrp>
    <!-- more linkGrps -->
    </div>
</div>
======
XSLT2 is certainly usable in this case, but all of my 
attempts have been hideously inefficient, or fail to 
accurately compare the nested children properly.

Suggestions?

Thanks,
-James

--
James Cummings, Cummings dot James at GMail dot com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>