Re: XSLT 2.0 : Unicode hex notation in regular expressions

David Carlisle a écrit :

> [\\u0600-\\u06FF]
>
>
> \\ is a literal \ so I  that matches
>  any one of characters \ u 0 6 F and all characters in the range  0 to \,
>  except that 0 is char 48 and / is char 47 so this range is empty.

OK, got it. I now know why ":" matches [\\u0600-\\u06FF]. It is becausethe colon is char 58 (x3A), between zero which is char 48 (x30) and thebackward slash which is char 92 (x5C).


> You don't need the u-notation to enter  code points into regexp (and
> they don't work)

Sorry to insist : why don't they work ? Aren't they supposed to do ?

If so, is it a Saxon-related problem or a more general one that wouldindicate that UTS #18 is still to be implemented, is irrelevant orwhatever ?

How, for example, to use a useful syntax likematches(.,'\p{Script:Arabic}+') ?


> as you can just enter the characters directly

Mmmh... not always easy because of control characters. For arabic, seehttp://www.fileformat.info/info/unicode/char/0600/index.htm.


> or if
> you want an ascii representation use xml character references,
> & # x a b c ;

Indeed. <xsl:whentest="matches(.,'[؀-ۿ]+')">arabic</xsl:when> gives me theexpected result. Thanks for the reminder !


Cheers,

p.b.

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

recursivity and param, Xavier Boully

Next by Date:

RE: recursivity and param, Joe Fawcett

Previous by Thread:

Re: XSLT 2.0 : Unicode hex notation in regular expressions, David Carlisle

Next by Thread:

Re: XSLT 2.0 : Unicode hex notation in regular expressions, David Carlisle

Indexes:

[Date] [Thread] [Top] [All Lists]