Thank you Mike and David,
Both stylesheets are performing extremely well. I ran them on the complete
xml version of Hamlet and the results are: 657 milliseconds and 781
milliseconds -- respectively David's and Mike's transformation. Even though
my computer was 3GHz 2GB RAM these results are fantastic.
I think, these XSLT 2.0 examples completely disspell the myth that XSLT is
not to be used for (efficient) text processing.
Cheers,
Dimitre Novatchev.
FXSL developer,
http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html
"Michael Kay" <mhk(_at_)mhk(_dot_)me(_dot_)uk> wrote in message
news:000001c3ee5d$35306880$6401a8c0(_at_)pcukmka(_dot_)(_dot_)(_dot_)
Sorry for the buggy code. Here is a working version:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:template match="/">
<frequencies>
<xsl:for-each-group group-by="." select="
for $w in tokenize(string(.), '[\s.?!,]+')[.] return lower-case($w)">
<xsl:sort select="count(current-group())" order="descending"/>
<word><xsl:value-of select="current-grouping-key(), ' - ',
count(current-group())"/></word>
</xsl:for-each-group>
</frequencies>
</xsl:template>
</xsl:stylesheet>
(The predicate [.] elimitates the zero-length string)
Here's the start of the output for othello.xml:
<?xml version="1.0" encoding="UTF-8"?>
<frequencies>
<word>i - 816</word>
<word>and - 794</word>
<word>the - 762</word>
<word>to - 591</word>
<word>of - 476</word>
<word>you - 458</word>
<word>a - 445</word>
<word>my - 427</word>
<word>that - 368</word>
<word>iago - 351</word>
<word>in - 336</word>
<word>othello - 323</word>
<word>not - 313</word>
<word>it - 306</word>
<word>is - 286</word>
<word>me - 256</word>
<word>cassio - 236</word>
<word>for - 234</word>
<word>with - 222</word>
<word>be - 220</word>
<word>he - 220</word>
<word>this - 217</word>
<word>desdemona - 217</word>
<word>but - 217</word>
<word>do - 212</word>
<word>your - 207</word>
<word>have - 203</word>
<word>her - 202</word>
<word>what - 178</word>
<word>him - 171</word>
<word>his - 166</word>
<word>as - 166</word>
<word>she - 155</word>
<word>so - 151</word>
<word>will - 146</word>
<word>o - 143</word>
<word>thou - 142</word>
<word>if - 137</word>
<word>emilia - 136</word>
<word>by - 112</word>
Michael Kay
Sorted by descending frequency:
<xsl:for-each-group select="
for $w in tokenize(string(foo), "[\s.?!]*") return
lower-case($w)">
<xsl:sort select="count(current-group())" order="descending"/>
<xsl:value-of select="current-grouping-key(), ' - ',
count(current-group())"/> </xsl:for-each>
Sorry, but cannot make this work.
First had to remove the nested quotes. Then to change the ending tag.
Now I get the message:
"Error at xsl:for-each-group on line 10 of file:/(Untitled):
Exactly one of the attributes group-by, group-adjacent,
group-starting-with, and group-ending-with must be specified"
Probably this is something trivial, but this is the first
time I'm trying an XSLT 2.0 grouping example.
Cheers,
Dimitre Novatchev.
FXSL developer,
http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list