xsl-list
[Top] [All Lists]

RE: [xsl] Regular expression for matching sentences

2006-08-18 03:08:35
The reason that \b isn't in the XPath regular expression dialect is that its
meaning is very sensitive to the conventions of the natural language that
you're using. This restriction is in the W3C spec, not in Saxon.

A lot depends how clever you want to be. I would think you'd get a 95%
success rate by using

tokenize($in, '[\.\?!]\s+')

but in my experience it's tricky knowing what to do about characters such as
")", '"', or em dash that might appear immediately after a full stop.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Carlo Liwanag [mailto:cliwanag(_at_)asiatype(_dot_)com] 
Sent: 18 August 2006 10:56
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Regular expression for matching sentences

I am trying to match my text template to catch sentences 
(sentences will end in '.','?','!') So that I can count the 
number of em-spaces on it. But I just don't know how to 
create it without using \b (because saxon probably does not 
support it). Is there an alternative? Please help.
Thanks,
Carlo


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--