xsl-list
[Top] [All Lists]

[xsl] HST's answers Re: [xsl] Efficient way to check sequence membership -

2011-03-02 16:00:47
I've thought of five ways to do this:

 1) tokenise and use "some ...", as in the previous message;
 2) Add '|' at the beginning of both $stopPat and the word to be
    checked, and use contains;
 3) Put a sequence of elements with a 'w' attribute whose value is a stop
     in $stops, then do boolean($stops/*[@w=$w]);
 4) As above, but then define an appropriate key and use
     boolean($stops/key('stop',$w));
 5) Build a regexp and use match:
     concat('^(',$stopPat,')$')

For (1) and (2), I tried both having $stopPat as in the previous
message, and a variant (1a, 2a) in which the list was sorted in
descending order of frequency in English.

Look away now if you want to guess what the order of performance
is. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Version  raw time  time - baseline

   0         5
   4         7          2
   2         8          3
   2a        8          3
   1a       14          9
   1        15         10
   3        28         23
   5        30         25

where 0 is the baseline where the stop function does no actual work,
and the time is average over 100 iterations, in milliseconds.

I'm really interested if anyone has a better approach.  Of course, I'm
also interested to find out if other implementations show a similar
pattern.

I've put up a gzipped tar file [1] of all the files you need to
reproduce the experiment -- one .xsl for each version, and q.xml for
input.

The stopss.xsl file is there so you can test that you are getting the
right answer!  Replace my:stop1 with your version in that file, and
check that the output is

243367200142031010020120103000130001022001513610014414440

ht

[1] http://www.ltg.ed.ac.uk/~ht/memberCheck.tar.gz
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: 
ht(_at_)inf(_dot_)ed(_dot_)ac(_dot_)uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--