David Carlisle a écrit :
> [\\u0600-\\u06FF]
>
>
> \\ is a literal \ so I that matches
> any one of characters \ u 0 6 F and all characters in the range 0 to \,
> except that 0 is char 48 and / is char 47 so this range is empty.
OK, got it. I now know why ":" matches [\\u0600-\\u06FF]. It is because
the colon is char 58 (x3A), between zero which is char 48 (x30) and the
backward slash which is char 92 (x5C).
> You don't need the u-notation to enter code points into regexp (and
> they don't work)
Sorry to insist : why don't they work ? Aren't they supposed to do ?
If so, is it a Saxon-related problem or a more general one that would
indicate that UTS #18 is still to be implemented, is irrelevant or
whatever ?
How, for example, to use a useful syntax like
matches(.,'\p{Script:Arabic}+') ?
> as you can just enter the characters directly
Mmmh... not always easy because of control characters. For arabic, see
http://www.fileformat.info/info/unicode/char/0600/index.htm.
> or if
> you want an ascii representation use xml character references,
> & # x a b c ;
Indeed. <xsl:when
test="matches(.,'[؀-ۿ]+')">arabic</xsl:when> gives me the
expected result. Thanks for the reminder !
Cheers,
p.b.