xsl-list
[Top] [All Lists]

Re: [xsl] Does 'Lec?ur' occur in $text? Do you have a multi-fa ctor XPath solution?

2013-01-18 17:54:45
On Fri, 2013-01-18 at 17:17 -0500, G. Ken Holman wrote:
2. Perhaps instead of the 'œ' ligature, $text uses 'oe'

Use normalize-unicode() on both operands.

I did not think it would work, so I created a test and indeed it does
not work. There's a good reason for this: generally speaking the single
letter œ and the two letters oe are not equivalent. Reading the Unicode
documentation for u0153 confirms that œ has no decomposition.

Here is the XSL for the test:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
  
<xsl:output method="xml" 
            encoding="UTF-8"/>

<xsl:template match="/">
  <xsl:for-each select="('NFC', 'NFD', 'NFKC', 'NFKD')">
    <xsl:message><xsl:value-of select="concat(current(), ' ')"/>
<xsl:value-of select="normalize-unicode('cœur', current()) =
normalize-unicode('coeur', current())"/></xsl:message> 
  </xsl:for-each>  
  <xsl:for-each select="('NFC', 'NFD', 'NFKC', 'NFKD')">
    <xsl:message><xsl:value-of select="concat(current(), ' ')"/>
<xsl:value-of select="normalize-unicode('é', current()) =
normalize-unicode('é', current())"/></xsl:message> 
  </xsl:for-each>  
</xsl:template>

</xsl:stylesheet>

The first loop compares cœur and coeur after normalization. The results
are false, no matter what normalization we use. The second loop is for
illustration purpose: it compares a é which made of two unicode code
points with é made of one unicode code point. The comparisons are true
in all cases, as expected.

If you save the XSL above as normalize-unicode.xsl, run it as:

$ saxon -s:normalize-unicode.xsl -xsl:normalize-unicode.xsl 

And you get:

NFC false
NFD false
NFKC false
NFKD false
NFC true
NFD true
NFKC true
NFKD true
<?xml version="1.0" encoding="UTF-8"?>

Sincerely,
Louis


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--