Re: [xsl] Using XSLT to build an index

Hi Ken and Michael.

Since I have already removed punctuation and substituted a space for thehyphens, I set up my regex expression as: '\s+'. I think that is correct totokenize a string of words separated by blanks, as mine are.


Using this input:

<Text lang="cz" data="Jaroslav Hašek 1883 1923" title="Czechoslovak Stamp2575" ref="1983-2575.htm"/><Text lang="cz" data="UNESCO" title="Czechoslovak Stamp 2575"ref="1983-2575.htm"/>



I tried Michael's idea with the following code:

<xsl:for-each-group select="Text" group-by="tokenize(@data,'\s+')">
      <xsl:for-each select="current-group()">
        <xsl:sort select="current-grouping-key()" lang="cz"/>
        <Word title="{@title}" ref="{@ref}">
          <xsl:value-of select="."/>
        </Word>
     </xsl:for-each>
    </xsl:for-each-group>

And received the warning: "Sort key will have no effect because its valuedoes not depend on the context item"


And the output:

<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
 <Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>

I expected this to produce five <Word> elements: 'Jaroslav', 'Hašek', '1883', '1923', and 'UNESCO', but only two were produced and the <xsl:value-of>returns nothing. Is my tokenize returning nothing? I clearly did somethingwrong, but cannot see what it is. I'll try Ken's coding next, but would liketo know what I did wrong.

As you surmised, no context is needed. I am collecting my <Text> elementsfrom a source XML file that, when my other stylesheets are applied, willgenerate the documents described in the @title and @ref attributes - i.e., Iam indexing data that will in the future be located in the describeddocuments, they themselves do not yet exist.


Ken:

With respect to the code you gave me yesterday, my understanding is that"distinct-values((//@czech)/tokenize(translate(normalize-space(.),'-,$%.#','')) )" would give me all the unique Czech words in my source document atonce, but since the documents I am indexing do not yet exist, getting thetitle and href of the indexed words in this instance would be problematic.That is why I chose to construct the <Text> elements from my source documentinstead. The key idea here is that my index does not refer to the sourcedocument itself, but to documents that will come into existence later.

Mark



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--