Hi Ken and Michael.
Since I have already removed punctuation and substituted a space for the
hyphens, I set up my regex expression as: '\s+'. I think that is correct to
tokenize a string of words separated by blanks, as mine are.
Using this input:
<Text lang="cz" data="Jaroslav Hašek 1883 1923" title="Czechoslovak Stamp
2575" ref="1983-2575.htm"/>
<Text lang="cz" data="UNESCO" title="Czechoslovak Stamp 2575"
ref="1983-2575.htm"/>
I tried Michael's idea with the following code:
<xsl:for-each-group select="Text" group-by="tokenize(@data,'\s+')">
<xsl:for-each select="current-group()">
<xsl:sort select="current-grouping-key()" lang="cz"/>
<Word title="{@title}" ref="{@ref}">
<xsl:value-of select="."/>
</Word>
</xsl:for-each>
</xsl:for-each-group>
And received the warning: "Sort key will have no effect because its value
does not depend on the context item"
And the output:
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
I expected this to produce five <Word> elements: 'Jaroslav', 'Hašek', '1883'
, '1923', and 'UNESCO', but only two were produced and the <xsl:value-of>
returns nothing. Is my tokenize returning nothing? I clearly did something
wrong, but cannot see what it is. I'll try Ken's coding next, but would like
to know what I did wrong.
As you surmised, no context is needed. I am collecting my <Text> elements
from a source XML file that, when my other stylesheets are applied, will
generate the documents described in the @title and @ref attributes - i.e., I
am indexing data that will in the future be located in the described
documents, they themselves do not yet exist.
Ken:
With respect to the code you gave me yesterday, my understanding is that
"distinct-values((//@czech)/tokenize(translate(normalize-space(.),'-,$%.#','
')) )" would give me all the unique Czech words in my source document at
once, but since the documents I am indexing do not yet exist, getting the
title and href of the indexed words in this instance would be problematic.
That is why I chose to construct the <Text> elements from my source document
instead. The key idea here is that my index does not refer to the source
document itself, but to documents that will come into existence later.
Mark
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--