Karen,
The regular expression syntax used in XPath 2.0 is defined by the XML
Schema Recommendation, as amended by XQuery/XPath2.0 Functions and
Operators. See
http://www.w3.org/TR/xmlschema-2/
especially Appendix F, Regular Expressions
and
http://www.w3.org/TR/xpath-functions/
7.6.1 Regular Expression Syntax
If you dig into these (especially the first) you'll find that \s is
equivalent to [#x20\t\n\r], which is to say the space character, the
tab character, the newline, or the return. This is consistent with
XML's general notion of what constitutes whitespace, for example as
used inside tags or declarations (see the XML Rec). Note that the
non-breaking space character is not in this set.
It's tricky what a "word" should be defined to be ... whether a word
count is properly derivable from an analysis of whitespace (or
whitespace plus punctuation) is arguable, but for most purposes it's
usually considered good enough, at any rate for English, especially
considering the alternatives.
(For example, out on the edge, if you ever have em-dashes or even
"--" hyphen pairs, without extra whitespace--like this--as is
sometimes seen--you'll count "words" like "whitespace--like".)
I hope that helps,
Wendell
At 05:57 PM 4/20/2006, you wrote:
I am using count(tokenize(lower-case(.),'(\s|[,.!:;])+')[string(.)])
-a technique I retrieved from the list for counting words. I have
been questioned about the regular expression that is being used to
find white spaces. The content can contain many kinds of whitespaces
and i am being asked to defend using this expression to find words.
Does the saxon 8b interpretation of this regular expression covers
as whitespaces
--------------------Karen McAdams
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--