So what is the best way to parameterise these to allow
turning on/off the removal of numbers? And while
we're at it, turning on/off the removal of hyphens or
other possibly-word-forming characters?
The second argument to tokenize which is what is being used to specify
the "inter word space/punctuation"can include or not the numbers, or
hyphens etc, it is a general string valued Xpath so in particular you
can make up the regexp on the fly using concat() or string-join()
passing in some parameters as needed.
taokenize(.,concat('(',$space,'|[$punct,$nums,$other,'])+'))
then you can set
<xsl:param name="space" select="'\s'"/>
<xsl:param name="punct" select="'!.,;:\?'"/>
<xsl:param name="nums" select="''"/> <!-- or '0-9' -->
<xsl:param name="other" select="''"/> <!-- or 'whatever you want ' -->
--
http://www.dcarlisle.demon.co.uk/matthew
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list