xsl-list
[Top] [All Lists]

Re: Re: text() word lists

2004-02-08 13:30:18
Thank you Mike and David,

Both stylesheets are performing extremely well. I ran them on the complete 
xml version of Hamlet and the results are: 657 milliseconds and 781 
milliseconds -- respectively David's and Mike's transformation. Even though 
my computer was 3GHz 2GB RAM these results are fantastic.

I think, these XSLT 2.0 examples completely disspell the myth that XSLT is 
not to be used for (efficient) text processing.


Cheers,

Dimitre Novatchev.
FXSL developer,

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html



"Michael Kay" <mhk(_at_)mhk(_dot_)me(_dot_)uk> wrote in message 
news:000001c3ee5d$35306880$6401a8c0(_at_)pcukmka(_dot_)(_dot_)(_dot_)
Sorry for the buggy code. Here is a working version:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output indent="yes"/>
<xsl:template match="/">
<frequencies>
<xsl:for-each-group group-by="." select="
  for $w in tokenize(string(.), '[\s.?!,]+')[.] return lower-case($w)">
 <xsl:sort select="count(current-group())" order="descending"/>
 <word><xsl:value-of select="current-grouping-key(), '  -  ',
count(current-group())"/></word>
</xsl:for-each-group>
</frequencies>
</xsl:template>
</xsl:stylesheet>

(The predicate [.] elimitates the zero-length string)

Here's the start of the output for othello.xml:

<?xml version="1.0" encoding="UTF-8"?>
<frequencies>
  <word>i   -   816</word>
  <word>and   -   794</word>
  <word>the   -   762</word>
  <word>to   -   591</word>
  <word>of   -   476</word>
  <word>you   -   458</word>
  <word>a   -   445</word>
  <word>my   -   427</word>
  <word>that   -   368</word>
  <word>iago   -   351</word>
  <word>in   -   336</word>
  <word>othello   -   323</word>
  <word>not   -   313</word>
  <word>it   -   306</word>
  <word>is   -   286</word>
  <word>me   -   256</word>
  <word>cassio   -   236</word>
  <word>for   -   234</word>
  <word>with   -   222</word>
  <word>be   -   220</word>
  <word>he   -   220</word>
  <word>this   -   217</word>
  <word>desdemona   -   217</word>
  <word>but   -   217</word>
  <word>do   -   212</word>
  <word>your   -   207</word>
  <word>have   -   203</word>
  <word>her   -   202</word>
  <word>what   -   178</word>
  <word>him   -   171</word>
  <word>his   -   166</word>
  <word>as   -   166</word>
  <word>she   -   155</word>
  <word>so   -   151</word>
  <word>will   -   146</word>
  <word>o   -   143</word>
  <word>thou   -   142</word>
  <word>if   -   137</word>
  <word>emilia   -   136</word>
  <word>by   -   112</word>

Michael Kay





Sorted by descending frequency:

<xsl:for-each-group select="
   for $w in tokenize(string(foo), "[\s.?!]*") return
lower-case($w)">
  <xsl:sort select="count(current-group())" order="descending"/>
  <xsl:value-of select="current-grouping-key(), '  -  ',
count(current-group())"/> </xsl:for-each>

Sorry, but cannot make this work.

First had to remove the nested quotes. Then to change the ending tag.

Now I get the message:

"Error at xsl:for-each-group on line 10 of file:/(Untitled):
  Exactly one of the attributes group-by, group-adjacent,
group-starting-with, and group-ending-with must be specified"

Probably this is something trivial, but this is the first
time I'm trying an XSLT 2.0 grouping example.


Cheers,

Dimitre Novatchev.
FXSL developer,

http://fxsl.sourceforge.net/ -- the home of FXSL
Resume: http://fxsl.sf.net/DNovatchev/Resume/Res.html




XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list






 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>