Ian Grigg wrote:
Perl uses this definition of whitespace:
\s A whitespace character [ \t\n\r\f]
which includes form feeds as 0x0c (I think).
Modern versions of perl also have a unicode compliant definition of
whitespace (and other things) see:
http://perl.active-venture.com/pod/perlretut-morecharacter.html
Java uses the java.lang.Character.isWhitespace()
method, which probably depends on the character
set!
java.lang.Character.isWhitespace() operates solely on chars, which are
unicode.
I don't know about Python, or Microsoft languages.
Modern versions of Python do unicode, I believe.
cheers
stuart
--
Stuart Yeates
stuart(_dot_)yeates(_at_)computing-services(_dot_)oxford(_dot_)ac(_dot_)uk
OSS Watch http://www.oss-watch.ac.uk/
Oxford Text Archive http://ota.ahds.ac.uk/
Humbul Humanities Hub http://www.humbul.ac.uk/