Muenchian grouping help - removing 'duplicates' from a nodeset


Hi all,

This is probably quite a basic question, but I've been scratching my head
over it all day and I could use some guidance.

I have an XML file which is going to be used as a "dictionary" for an
internationalised web application. The structure of he file is like so:

<dictionary>
        <text>foo</text>
        <text>bar</text>
        <text>foo</text>
        <text>baz</text>
        <text>foobar</text>
        (etc...)
</dictionary>

The file contains quite a few "duplicates" (in terms of the text() content
of the node), and I've been trying to figure out a way to strip out all the
dupicates, leaving me with an XML file with only unique <text> elements. 

I wrote an XSL to identify all the duplicates, and print them out [basically
using: test="current() = following-sibling::text or current() =
preceding-sibling::text"] But now I want to actually remove the duplicates
and create a new XML file in the output tree. 

I think they way to do this is via Muenchian grouping. I know what I need to
do: group all the <text> elements by their text() content; and select only
the first one in each group. But I've followed the guidelines on Jeni
Tennison's XSLT pages and I can't seem to get my head around how keys
actually work. 

So far I have tried (these are obviously just sample lines from my XSL):

<xsl:key name="text-by-content" match="text" use="normalize-space(text())"
/>

And then:

<xsl:apply-templates select="text[generate-id(.) =
generate-id(key('text-by-content', text())[1])]"/>

But this produces no output at all. 

I'm sure what I'm missing is blatently obvious... :-/

I'm using Sablotron 1.0, if that makes any difference. 

Thanks in advance,
Laura.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list