Sorry to insist : why don't they work ?
Because that's life:-)
Aren't they supposed to do ?
No the syntax in xslt is (except where otherwise noted) that of w3c xml
schema, and that doesn't have any notation like that.
If so, is it a Saxon-related problem or a more general one that would
indicate that UTS #18 is still to be implemented, is irrelevant or
whatever ?
The _semantics_ of unicode regexp comes from there eg the predefined
character classes (you may prefer to use a character class refering to
the arabic block for example rather than use explict code points) but (I
would guess) the U notation wasn't supported as that is the unicode
standard way of accessing characters by code point reference in plain
ascii text and that is never used in an XML context. U+06FF is legal XML
character data but it is those 6 characters, if you want to refer to
character hex 06ff you always use & # x 0 6 F F ; in XML.
How, for example, to use a useful syntax like
matches(.,'\p{Script:Arabic}+') ?
schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs
[Definition:] [Unicode Database] groups code points into a number of
blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul
Jamo, CJK Compatibility, etc. The set containing all characters that
have block name X (with all white space stripped out), can be identified
with a block escape \p{IsX}. The complement of this set is specified
with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]).
...
For example,
the ·block escape· for identifying the ASCII characters is \p{IsBasicLatin}.
so that would be \p(IsArabic)
David
________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________