xsl-list
[Top] [All Lists]

Re: [xsl] Using XSLT to build an index

2011-10-31 07:05:31
Hi Ken and Michael.
Since I have already removed punctuation and substituted a space for the hyphens, I set up my regex expression as: '\s+'. I think that is correct to tokenize a string of words separated by blanks, as mine are.

Using this input:

<Text lang="cz" data="Jaroslav Hašek 1883 1923" title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/> <Text lang="cz" data="UNESCO" title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>


I tried Michael's idea with the following code:

<xsl:for-each-group select="Text" group-by="tokenize(@data,'\s+')">
      <xsl:for-each select="current-group()">
        <xsl:sort select="current-grouping-key()" lang="cz"/>
        <Word title="{@title}" ref="{@ref}">
          <xsl:value-of select="."/>
        </Word>
     </xsl:for-each>
    </xsl:for-each-group>

And received the warning: "Sort key will have no effect because its value does not depend on the context item"

And the output:

<Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>
 <Word title="Czechoslovak Stamp 2575" ref="1983-2575.htm"/>

I expected this to produce five <Word> elements: 'Jaroslav', 'Hašek', '1883' , '1923', and 'UNESCO', but only two were produced and the <xsl:value-of> returns nothing. Is my tokenize returning nothing? I clearly did something wrong, but cannot see what it is. I'll try Ken's coding next, but would like to know what I did wrong.

As you surmised, no context is needed. I am collecting my <Text> elements from a source XML file that, when my other stylesheets are applied, will generate the documents described in the @title and @ref attributes - i.e., I am indexing data that will in the future be located in the described documents, they themselves do not yet exist.

Ken:
With respect to the code you gave me yesterday, my understanding is that "distinct-values((//@czech)/tokenize(translate(normalize-space(.),'-,$%.#',' ')) )" would give me all the unique Czech words in my source document at once, but since the documents I am indexing do not yet exist, getting the title and href of the indexed words in this instance would be problematic. That is why I chose to construct the <Text> elements from my source document instead. The key idea here is that my index does not refer to the source document itself, but to documents that will come into existence later.

Mark


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--