Re: [xsl] Using XSLT to build an index

On 31/10/2011 12:05, Mark wrote:

Hi Ken and Michael.
Since I have already removed punctuation and substituted a space forthe hyphens, I set up my regex expression as: '\s+'. I think that iscorrect to tokenize a string of words separated by blanks, as mine are.
Using this input:
<Text lang="cz" data="Jaroslav Hašek 1883 1923" title="CzechoslovakStamp 2575" ref="1983-2575.htm"/><Text lang="cz" data="UNESCO" title="Czechoslovak Stamp 2575"ref="1983-2575.htm"/>
I tried Michael's idea with the following code:

<xsl:for-each-group select="Text" group-by="tokenize(@data,'\s+')">
<xsl:for-each select="current-group()">
<xsl:sort select="current-grouping-key()" lang="cz"/>
<Word title="{@title}" ref="{@ref}">
<xsl:value-of select="."/>
</Word>
</xsl:for-each>
</xsl:for-each-group>
And received the warning: "Sort key will have no effect because itsvalue does not depend on the context item"

Sorry, I was careless. Try this:

<xsl:template match="doc">
<xsl:for-each-group select="Text" group-by="tokenize(@data,'\s+')">
<xsl:sort select="current-grouping-key()" lang="cz"/>
<xsl:for-each select="current-group()">
<Word title="{@title}" ref="{@ref}">
<xsl:value-of select="current-grouping-key()"/>
</Word>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>

that gives me:

<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm">1883</Word>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm">1923</Word>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm">Hašek</Word>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm">Jaroslav</Word>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm">UNESCO</Word>

(but I don't understand why the incorrect version gave you only two Wordelements)


Michael Kay
Saxonica

And the output:

<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
I expected this to produce five <Word> elements: 'Jaroslav', 'Hašek','1883' , '1923', and 'UNESCO', but only two were produced and the<xsl:value-of> returns nothing. Is my tokenize returning nothing? Iclearly did something wrong, but cannot see what it is. I'll try Ken'scoding next, but would like to know what I did wrong.
As you surmised, no context is needed. I am collecting my <Text>elements from a source XML file that, when my other stylesheets areapplied, will generate the documents described in the @title and @refattributes - i.e., I am indexing data that will in the future belocated in the described documents, they themselves do not yet exist.
Ken:
With respect to the code you gave me yesterday, my understanding isthat"distinct-values((//@czech)/tokenize(translate(normalize-space(.),'-,$%.#','')) )" would give me all the unique Czech words in my source documentat once, but since the documents I am indexing do not yet exist,getting the title and href of the indexed words in this instance wouldbe problematic. That is why I chose to construct the <Text> elementsfrom my source document instead. The key idea here is that my indexdoes not refer to the source document itself, but to documents thatwill come into existence later.
Mark


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--