xsl-list
[Top] [All Lists]

RE: xsl:sort with msxml english language, danish characters, weird results

2004-10-25 03:52:57

-- 
Bryan Rasmussen




It's up to the implementor whether they support lang="da" or not.
actually it turns out they do, these are the results
<?xml version="1.0" encoding="utf-8"?>
<e>aardvulf,
aardvark,
ålborg,
ødense,
æthling,
zebra,
zandinsky,
xerces,
tip,
odense,
fadøl,
eelburg,
</e>

at first I was confused and thought it was silly but then I remembered that aa
is an archaic way of representing å in danish. 

For lang="en", it's up to the implementor how special characters are
collated. This processor appears to be using a fairly commonly used
algorithm in which characters are given a primary weight (a, b, c), and a
secondary weight based on variations of the primary character (accents);
the
secondary weight of a character is taken into account only if the primary
weights of all characters are equal.



From your comment it seems you don't like this algorithm. It would be
interesting to know why, and what algorithm you would prefer.


well actually if they just figured that unknown characters were at the end of
the alphabet then the sort order would be closer to danish, also the same with
norwegian and I believe swedish (I wonder how often that is the case). What are
the rules for accenting? I suppose that most people if you asked them what ø was
would say that's an o with a slash through it, and an æ was an a and an e stuck
really close together, hence mnemonic entities, but is that the rule for
determining what is an accented character? We asked 100 people and 90 gave the
following answer? 

I did a big long digression on the subject wondering about how something is
defined as accented, but in the end it really just was me wondering what are the
rules and if indeed there are any other than, we think that looks sort of like
something we're familiar with. 


-----Original Message-----
From: Bryan Rasmussen [mailto:bry(_at_)itnisk(_dot_)com] 
Sent: 25 October 2004 10:14
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:sort with msxml english language, danish 
characters, weird results


-- 
Bryan Rasmussen

Hi, I was doing some tests of sorting by various 
languages/charsets etc. and I
came across the following irritation; given xml like the following:
<?xml version="1.0" encoding="UTF-8"?>
<words>
<word>aardvark</word>
<word>ødense</word>
<word>ålborg</word>
<word>aardvulf</word>
<word>odense</word>
<word>eelburg</word>
<word>zebra</word>
<word>zandinsky</word>
<word>tip</word>
<word>æthling</word>
<word>fadøl</word>
<word>xerces</word>
</words>
and an xslt like the following:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:param name="sortby" select="'en'"/>
<xsl:output method="xml" encoding="utf-8"/>
<xsl:template match="/">
<e>
<xsl:for-each select="/words/word">
<xsl:sort data-type="text" select="." lang="{$sortby}" 
order="descending" 
case-order="upper-first"/>
<xsl:value-of select="." />,
</xsl:for-each>
</e>
</xsl:template>
</xsl:stylesheet>

the output in msxsl the command line tool for msxml is:

<?xml version="1.0" encoding="utf-8"?><e>zebra,
zandinsky,
xerces,
tip,
ødense,
odense,
fadøl,
eelburg,
ålborg,
æthling,
aardvulf,
aardvark,
</e>

This by the way is not the sort order for danish characters, 
it does not allow
sorting if the language sortby parameter is set to da (or at 
least not in the
proper order), so this being the case I wonder what the 
reasoning is behind the
sort order I'm seeing when the sortby parameter is en.

Anybody know, have any ideas? 





--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--




--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--