Not that I understand it,
but ( and ) seem to be included Michael?
<word>) - 71</word>
<word>(this - 11</word>
Is it modify by updating
for $w in tokenize(string(.), '[\s.?!,]+')[.] return
line?
for $w in tokenize(string(.), '[\s.?!, )(]+')[.] return
seems to work.
I only spent five minutes on this: producing a decent natural language
tokenizer takes a little bit longer than that! Obviously its easy to
write a more intelligent regex, I was only trying to illustrate the
principles.
Michael Kay
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list