On 03/11/2011 16:23, Houghton,Andrew wrote:
Your string-to-codepoints example only works for ASCII upper/lower case
letters. It fails to recognize composed and decomposed diacritical characters
such as a combined uppercase A with a grave U+00C1, with an accute U+00C1, with
a circumflex U+00C2, etc. Yes you could detect these too with additional logic,
but matches() with a character class of \p{Ll}, \p{Lu}, \p{Lt} handles all the
messy details of Unicode.
Andy.
If you want to handle both composed and decomposed characters then it's
probably safest to use normalize-unicode() before using matches().
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--