RE: xsl:sort with msxml english language, danish characters, weird results
2004-10-25 03:01:03
It's up to the implementor whether they support lang="da" or not.
For lang="en", it's up to the implementor how special characters are
collated. This processor appears to be using a fairly commonly used
algorithm in which characters are given a primary weight (a, b, c), and a
secondary weight based on variations of the primary character (accents); the
secondary weight of a character is taken into account only if the primary
weights of all characters are equal.
This algorithm is similar to those used by publishers when foreign-language
names are used in an English publication such as a gazetteer, though I've
seen many variations in the actual practice of different publishers.
From your comment it seems you don't like this algorithm. It would be
interesting to know why, and what algorithm you would prefer.
XSLT 2.0 provides much more detailed control over the choice of collation,
but of course your choice is still limited to those collations the vendor
chooses to supply.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Bryan Rasmussen [mailto:bry(_at_)itnisk(_dot_)com]
Sent: 25 October 2004 10:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:sort with msxml english language, danish
characters, weird results
--
Bryan Rasmussen
Hi, I was doing some tests of sorting by various
languages/charsets etc. and I
came across the following irritation; given xml like the following:
<?xml version="1.0" encoding="UTF-8"?>
<words>
<word>aardvark</word>
<word>ødense</word>
<word>ålborg</word>
<word>aardvulf</word>
<word>odense</word>
<word>eelburg</word>
<word>zebra</word>
<word>zandinsky</word>
<word>tip</word>
<word>æthling</word>
<word>fadøl</word>
<word>xerces</word>
</words>
and an xslt like the following:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:param name="sortby" select="'en'"/>
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<e>
<xsl:for-each select="/words/word">
<xsl:sort data-type="text" select="." lang="{$sortby}"
order="descending"
case-order="upper-first"/>
<xsl:value-of select="." />,
</xsl:for-each>
</e>
</xsl:template>
</xsl:stylesheet>
the output in msxsl the command line tool for msxml is:
<?xml version="1.0" encoding="utf-8"?><e>zebra,
zandinsky,
xerces,
tip,
ødense,
odense,
fadøl,
eelburg,
ålborg,
æthling,
aardvulf,
aardvark,
</e>
This by the way is not the sort order for danish characters,
it does not allow
sorting if the language sortby parameter is set to da (or at
least not in the
proper order), so this being the case I wonder what the
reasoning is behind the
sort order I'm seeing when the sortby parameter is en.
Anybody know, have any ideas?
--+------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- xsl:sort with msxml english language, danish characters, weird results, Bryan Rasmussen
- RE: xsl:sort with msxml english language, danish characters, weird results,
Michael Kay <=
- RE: xsl:sort with msxml english language, danish characters, weird results, Bryan Rasmussen
- RE: xsl:sort with msxml english language, danish characters, weird results, Bryan Rasmussen
- RE: xsl:sort with msxml english language, danish characters, weird results, Michael Kay
- Re: xsl:sort with msxml english language, danish characters, weird results, W. Eliot Kimber
- RE: xsl:sort with msxml english language, danish characters, weird results, Michael Kay
- Re: xsl:sort with msxml english language, danish characters, weird results, W. Eliot Kimber
- RE: xsl:sort with msxml english language, danish characters, weird results, Michael Kay
- Re: xsl:sort with msxml english language, danish characters, weird results, W. Eliot Kimber
- Re: xsl:sort with msxml english language, danish characters, weird results, Colin Paul Adams
- Re: xsl:sort with msxml english language, danish characters, weird results, Bryan Rasmussen
|
|
|