I've thought of five ways to do this:
1) tokenise and use "some ...", as in the previous message;
2) Add '|' at the beginning of both $stopPat and the word to be
checked, and use contains;
3) Put a sequence of elements with a 'w' attribute whose value is a stop
in $stops, then do boolean($stops/*[@w=$w]);
4) As above, but then define an appropriate key and use
boolean($stops/key('stop',$w));
5) Build a regexp and use match:
concat('^(',$stopPat,')$')
For (1) and (2), I tried both having $stopPat as in the previous
message, and a variant (1a, 2a) in which the list was sorted in
descending order of frequency in English.
Look away now if you want to guess what the order of performance
is. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Version raw time time - baseline
0 5
4 7 2
2 8 3
2a 8 3
1a 14 9
1 15 10
3 28 23
5 30 25
where 0 is the baseline where the stop function does no actual work,
and the time is average over 100 iterations, in milliseconds.
I'm really interested if anyone has a better approach. Of course, I'm
also interested to find out if other implementations show a similar
pattern.
I've put up a gzipped tar file [1] of all the files you need to
reproduce the experiment -- one .xsl for each version, and q.xml for
input.
The stopss.xsl file is there so you can test that you are getting the
right answer! Replace my:stop1 with your version in that file, and
check that the output is
243367200142031010020120103000130001022001513610014414440
ht
[1] http://www.ltg.ed.ac.uk/~ht/memberCheck.tar.gz
--
Henry S. Thompson, School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 651-1426, e-mail:
ht(_at_)inf(_dot_)ed(_dot_)ac(_dot_)uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail from me _always_ has a .sig like this -- mail without it is forged spam]
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--