xsl-list
[Top] [All Lists]

RE: Sorting Upper-Case first. Microsoft bug?

2003-08-28 03:31:01
Please excuse a long delay in responding with a comment that I hope is relevant 
and helpful, prompted in part by Stan Devitt's observation:

The term "lexiciographic" has for a very long time had a specific 
technical meaning in  CS and Math circles and hence in any document 
describing sorting algorithms in a programming language.

Lexicographic ordering in a natural language will be that used by 
lexicographers, in which alphabetical order is qualified first by diacritics 
and then by case, but the community that subscribes to this list spends a great 
part of the working day using non-natural languages, and I for one lost sight 
of this distinction. I have seen the inelegant but expressive word 
"asciibetical" used to describe an ordering that might be used for variable 
names. It might be overkill to require lang="cs" (for computer science), but I 
find it helpful to remember that this is an area with its own cultural 
conventions. The alphabet David Carlisle required in his example runs 
...VvWwXx... and the adjacent symbols are distinct and not variants.

A quarter of a century ago we worked in a world where a "character" was an 
8-bit integer which some peripheral devices could interpret graphically. The 
world has become a richer place.

I hope this is useful.

John Marshall
Accurate Software

80 Peach Street, Wokingham, Berkshire, RG40 1XH, UK.
Tel: +44 (0)118 977 3889
Fax: +44 (0)118 977 1260
http://www.accuratesoftware.com <http://www.accuratesoftware.com>  




-----Original Message-----
From: David Carlisle [mailto:davidc(_at_)nag(_dot_)co(_dot_)uk]
Sent: 08 August 2003 10:39
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Sorting Upper-Case first. Microsoft bug?



Dr. Johnson and every lexicographer since has used case as the least
significant, most rapidly varying element in ordering. The example I
have in front of me from the Concise Oxford Dictionary lists daily -
Dalmatian - dalmatic and I would not expect it to do anything else. 

Dictionaries are not really a good example to follow here as they don't
have to deal with all strings, it probably doesn't list
DAILY or dalmatioN at all, but xsl:sort has to deal with these things.


When Dennis Ritchie devised C before 1978, strcmp() would give a sort
order that would place Dalmatian first (assuming ASCII) but in those
days most of us were still using uppercase-only i/o devices and not
worried about such refinements. If we were, we used strcmpi().

ASCII ordering would put all the uppercase before all the lowercase:
ordering A B C a b c.
No one has suggested xsl:sort is specified as doing that, despite
several people giving that as a reason for not implementing xsl:sort as
specified.

The world has moved on and the whole thrust of Unicode is to coerce the
mechanical representation of text into natural linguistic usage, so
Dr. Johnson wins. 

As I commented before, the discussion really isn't about the best way of
sorting. XSLT2 is far more flexible, and far more explictly system
dependent in this area, which is probably a good thing. The question is
about what the XSLT 1 spec says.


There will be all sorts of interesting issues that arise in considering
the natural ordering of words from different linguistic groups, not
borrowings like yacht and pyjama, but with equal cultural weight. 

Yes, of course.

I suspect you are in a minority of one and the unanimity of the XSLT
processors suggests that the interpretation they have adopted is the
correct one.

I wouldn't disagree with you that the evidence suggests that within a
relevant community I am in a minority, however given that the phrase
"lexicographic ordering" is (and has been for a century or so) totally
standard terminology used without comment in any mathematical work on
ordered sets (a field which covers a large part of the mathematical
literature) and is similarly standard terminology in any computer
science discussion of sorting, I wouldn't say that there is any
room for interpretation in the text of the XSLT 1 spec. It would take an
errata to change the text of the specification to justify the currently
implemented algorithms.

I can understand if lexicographers are annoyed if the term
"lexicographic ordering" doesn't describe an ordering that they
recognise as useful, as it is a purely mechanical ordering ignoring the
art of lexicography entirely, but on the other hand they should be used
to the idea that words get used by convention in ways not immediately
suggested by their etymology.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list




Accurate Software

info(_at_)accuratesoftware(_dot_)com
www.accuratesoftware.com

Europe . North America . Australasia . Africa

The information in this email is confidential and privileged and is intended 
only for the use of the individual or entity listed above.  If you are neither 
the intended individual, or entity listed above, nor the person responsible for 
the delivery of this email to the intended recipients, you are hereby notified 
that any unauthorised distribution, copying or use of this email is prohibited. 
If you have received this email in error, please notify the Accurate system 
manager at postmaster(_at_)accuratesoftware(_dot_)com or on +44 (0)118 977 
3889.  The views expressed in this communication may not necessarily be the 
views held by the Accurate Group.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list