xsl-list
[Top] [All Lists]

RE: XSLT 2.0 : Unicode hex notation in regular expressions

2004-08-12 04:12:08
The notation \u1234 is not supported in XPath 2.0 regular expressions. Use
ሴ instead.

Michael Kay
 

-----Original Message-----
From: Pierrick Brihaye [mailto:pierrick(_dot_)brihaye(_at_)wanadoo(_dot_)fr] 
Sent: 12 August 2004 10:38
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions

Hi,

I don't know if my XSLT syntax is wrong or if it is a Saxon-related 
problem. Let's blame the XSLT writer rather than the XSLT processor 
first ;-)

Given the following XML :

<?xml version="1.0" encoding="UTF-8"?>
<text>livre : ????</text>

And the following XSLT :

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
   <xsl:template match="/text">
     <xsl:comment><xsl:value-of 
select="system-property('xsl:vendor')" 
/></xsl:comment>
     <words>
       <xsl:for-each select="tokenize(.,'\s+')">
         <word>
           <xsl:attribute name="language">
             <xsl:choose>
               <xsl:when test="matches(.,'[a-z]+')">latin</xsl:when>
               <xsl:when 
test="matches(.,'[\\u0600-\\u06FF]+')">arabic</xsl:when>
               <xsl:otherwise>whatever</xsl:otherwise>
             </xsl:choose>
           </xsl:attribute>
           <xsl:attribute name="codepoints"><xsl:value-of 
select="string-to-codepoints(.)"/></xsl:attribute>
           <xsl:value-of select="."/>
         </word>
       </xsl:for-each>
     </words>
   </xsl:template>
</xsl:stylesheet>

I get :

<?xml version="1.0" encoding="UTF-8"?>
<!--SAXON 8.0 from Saxonica-->
<words>
   <word language="latin" codepoints="108 105 118 114 
101">livre</word>
   <word language="arabic" codepoints="58">:</word>
   <word language="whatever" codepoints="1603 1578 1575 
1576">????</word>
</words>

Why this curious match for codepoint 58 ? And why no match for the 
arabic characters ?

BTW, I first tried : matches(.,'[\u0600-\u06FF]+') as stated by 
http://www.unicode.org/reports/tr18/#Hex_notation

But Saxon returned the following error :

Error at xsl:when on line 11 of file:/C:/...:
   net.sf.saxon.type.RegexTranslator$RegexSyntaxException: Error at 
character 2 in regular expression: bad escape sequence

That's why I doubled the "\" character. Is this doubling 
spec-compliant ?

Cheers,

p.b.

--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--