xsl-list
[Top] [All Lists]

Re: Sorting Upper-Case first. Microsoft bug?

2003-08-06 12:52:04
Hello Stan,

Stan Devitt wrote:

I apologize for yet another message on lexicographic sorting but
in light of the considerable confusion exibited on this issue I'd like 
to see
three points emphasised.

It seems that I have a completely different sigth on these topics.


1.  Lexicograpahic  is important precisely because it is so well defined 
(and 
because of this I suspect the spec writers really meant it when the 
wrote it in. )
It  provides an easy to check reference implementation that is 99% usable.

XSLT was made to transform documents in the first place.
Typical problems in this field are "how can I sort my index?" or
"how can I sort my glossary?".

A lexicographically sorting in your sense is not (=0%) useful here:
-  I don't want the group headers "A" "a" "B" "b" in my index.
-  I don't want "XSLT" before all or after all "Xs..." words.
-  I don't want "eXtensible" before "eat" or after "eye".


2.  The notion of "lexicographic sorting" in the "culturally correct" 
manner is  also valid,
but it falls short of  implementing all of UTR 10.   The only  "cultural 
choice" you have in a
lexicograpahic sort  is in deciding on a total order of the symbols of 
your  alphabet.
After that, everything else is determined.  

Sorting by UTR 10 doesn't mean "sort undetermined" or "sort randomly".
There are exact rules, some of which are cultural choices, most not.
UTR 10 provides sorting in more than one levels. Look at UTR 10,
section 4 to see the algorithm.

Look at the proposal http://www.unicode.org/reports/tr10/tr10-10.html
for some great examples (in the first chapter).

On the other hand, how should one define a universal total order
of all Unicode symbols, to achieve a senseful lexicographic sorting?
Is "ä" smaller or greater or equal or unrelated to "a"?
How are the greek "alpha" or the hebrew "aleph" related to "a"?

Lexicographic sorting of Unicode strings is not useful for anything
practical I can think about.


3.  Placing selected  "words" out of lexicographic order (however well 
intended)
clearly violates the lexicographic constraint of the spec and is in 
error as the spec
is currently worded.

Which selected words do you mean here?


As a follow on action,  I'd like to see the spec writers clarify (in the 
spec)
that they really  do mean lexicographic, and perhaps augment the list of 
available sorts
by a  "pseudo-lexicographical" or "word" based sort in order to capture
what actually got implemented and which is important for its own reasons
but is much less well defined.

I would love to see the spec writers eliminate the single word
"lexicographic" from the spec.

Interestingly, the list of available sorts does NOT include "lexicographic".
The sort method we are talking about is named "text".
I would not want "text" to be replaced by "word", as this is misleading.
I would also not want "text" to be replaced by "pseudo-lexicographical"
because I can't spell it without at least 5 typos.


Stan Devitt

Just my point of view,
Markus

PS: The definition David found at 
http://mathworld.wolfram.com/LexicographicOrder.html
contains the funny sentence:
"Lexicographic order is sometimes called dictionary order."

__________________________
Markus Abt
Comet Computer GmbH
http://www.comet.de



Markus Abt wrote:

David,

It seems to me that the XSLT specification wants lexicographic ordering in the
culturally correct manner.
Mabye this is a contradiction, in this case I would regard this an error in 
the XSLT spec.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list