Re: XSLT 2.0 : Unicode hex notation in regular expressions

Sorry to insist : why don't they work ?


Because that's life:-)

 Aren't they supposed to do ?


No the syntax in xslt is (except where otherwise noted) that of w3c xml
schema, and that doesn't have any notation like that.

If so, is it a Saxon-related problem or a more general one that would 
indicate that UTS #18 is still to be implemented, is irrelevant or 
whatever ?


The _semantics_ of unicode regexp comes from there eg the predefined
character classes (you may prefer to use a character class refering to
the arabic block for example rather than use explict code points) but (I
would guess) the U notation wasn't supported as that is the unicode
standard way of accessing characters by code point reference in plain
ascii text and that is never used in an XML context. U+06FF is legal XML
character data but it is those 6 characters, if you want to refer to
character hex 06ff you always use & # x 0 6 F F ; in XML.


  How, for example, to use a useful syntax like 
  matches(.,'\p{Script:Arabic}+') ?

schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs

[Definition:] [Unicode Database] groups code points into a number of
blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul
Jamo, CJK Compatibility, etc. The set containing all characters that
have block name X (with all white space stripped out), can be identified
with a block escape \p{IsX}. The complement of this set is specified
with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]).
...
For example,
the ·block escape· for identifying the ASCII characters is \p{IsBasicLatin}. 



so that would be \p(IsArabic)

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________