xsl-list
[Top] [All Lists]

RE: xsl:sort with msxml english language, danish characters, weird results

2004-10-25 03:01:03
It's up to the implementor whether they support lang="da" or not.

For lang="en", it's up to the implementor how special characters are
collated. This processor appears to be using a fairly commonly used
algorithm in which characters are given a primary weight (a, b, c), and a
secondary weight based on variations of the primary character (accents); the
secondary weight of a character is taken into account only if the primary
weights of all characters are equal.

This algorithm is similar to those used by publishers when foreign-language
names are used in an English publication such as a gazetteer, though I've
seen many variations in the actual practice of different publishers.

From your comment it seems you don't like this algorithm. It would be
interesting to know why, and what algorithm you would prefer.

XSLT 2.0 provides much more detailed control over the choice of collation,
but of course your choice is still limited to those collations the vendor
chooses to supply.

Michael Kay
http://www.saxonica.com/  

-----Original Message-----
From: Bryan Rasmussen [mailto:bry(_at_)itnisk(_dot_)com] 
Sent: 25 October 2004 10:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:sort with msxml english language, danish 
characters, weird results


-- 
Bryan Rasmussen

Hi, I was doing some tests of sorting by various 
languages/charsets etc. and I
came across the following irritation; given xml like the following:
<?xml version="1.0" encoding="UTF-8"?>
<words>
<word>aardvark</word>
<word>ødense</word>
<word>ålborg</word>
<word>aardvulf</word>
<word>odense</word>
<word>eelburg</word>
<word>zebra</word>
<word>zandinsky</word>
<word>tip</word>
<word>æthling</word>
<word>fadøl</word>
<word>xerces</word>
</words>
and an xslt like the following:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:param name="sortby" select="'en'"/>
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<e>
<xsl:for-each select="/words/word">
<xsl:sort data-type="text" select="." lang="{$sortby}" 
order="descending" 
case-order="upper-first"/>
<xsl:value-of select="." />,
</xsl:for-each>
</e>
</xsl:template>
</xsl:stylesheet>

the output in msxsl the command line tool for msxml is:

<?xml version="1.0" encoding="utf-8"?><e>zebra,
zandinsky,
xerces,
tip,
ødense,
odense,
fadøl,
eelburg,
ålborg,
æthling,
aardvulf,
aardvark,
</e>

This by the way is not the sort order for danish characters, 
it does not allow
sorting if the language sortby parameter is set to da (or at 
least not in the
proper order), so this being the case I wonder what the 
reasoning is behind the
sort order I'm seeing when the sortby parameter is en.

Anybody know, have any ideas? 





--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--