xsl-list
[Top] [All Lists]

Re: [xsl] XSLT 2.0 : Unicode hex notation in regular expressions

2006-06-12 12:26:22
How, for example, to use a useful syntax like
  matches(.,'\p{Script:Arabic}+') ?

schema-2 says: http://www.w3.org/TR/xmlschema-2/#regexs

[Definition:] [Unicode Database] groups code points into a number of
blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul
Jamo, CJK Compatibility, etc. The set containing all characters that
have block name X (with all white space stripped out), can be identified
with a block escape \p{IsX}. The complement of this set is specified
with the block escape \P{IsX}. ([\P{IsX}] = [^\p{IsX}]).
...
For example,
the ·block escape· for identifying the ASCII characters is \p{IsBasicLatin}.

so that would be \p(IsArabic)

David



I want to use the above construct to detect Japanese characters, and so I am 
using the
following xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
     <xsl:output method="xml" indent="yes" encoding="UTF-8" />
     <xsl:template match="/text">
        <xsl:for-each select="tokenize(.,'\s+')">
          <word>
            <xsl:attribute name="language">
              <xsl:choose>
                 <xsl:when 
test="matches(.,'\p{IsCJKCompatibility}+')">Japanese</xsl:when>
                 <xsl:when 
test="matches(.,'\p{IsBasicLatin}+')">Latin</xsl:when>
                 <xsl:otherwise>Unknown</xsl:otherwise>
              </xsl:choose>
            </xsl:attribute>
          </word>
        </xsl:for-each>
     </xsl:template>
</xsl:stylesheet>

However, the Japanese characters in my input, which are encoded in UTF-8, come 
out flagged as Latin
or Unknown.  What am I doing wrong?  How do I get this to recognize the 
Japanese characters?

Thanks for any help you can offer.

John Besch


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>