On Fri, 2013-01-18 at 17:17 -0500, G. Ken Holman wrote:
2. Perhaps instead of the '' ligature, $text uses 'oe'
Use normalize-unicode() on both operands.
I did not think it would work, so I created a test and indeed it does
not work. There's a good reason for this: generally speaking the single
letter œ and the two letters oe are not equivalent. Reading the Unicode
documentation for u0153 confirms that œ has no decomposition.
Here is the XSL for the test:
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"
encoding="UTF-8"/>
<xsl:template match="/">
<xsl:for-each select="('NFC', 'NFD', 'NFKC', 'NFKD')">
<xsl:message><xsl:value-of select="concat(current(), ' ')"/>
<xsl:value-of select="normalize-unicode('cœur', current()) =
normalize-unicode('coeur', current())"/></xsl:message>
</xsl:for-each>
<xsl:for-each select="('NFC', 'NFD', 'NFKC', 'NFKD')">
<xsl:message><xsl:value-of select="concat(current(), ' ')"/>
<xsl:value-of select="normalize-unicode('é', current()) =
normalize-unicode('é', current())"/></xsl:message>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The first loop compares cœur and coeur after normalization. The results
are false, no matter what normalization we use. The second loop is for
illustration purpose: it compares a é which made of two unicode code
points with é made of one unicode code point. The comparisons are true
in all cases, as expected.
If you save the XSL above as normalize-unicode.xsl, run it as:
$ saxon -s:normalize-unicode.xsl -xsl:normalize-unicode.xsl
And you get:
NFC false
NFD false
NFKC false
NFKD false
NFC true
NFD true
NFKC true
NFKD true
<?xml version="1.0" encoding="UTF-8"?>
Sincerely,
Louis
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--