xsl-list
[Top] [All Lists]

RE: Is this a sorting bug in xalan 6.4.0?

2003-02-06 22:51:17
Michael Kay wrote:

The XSLT 1.0 specification does not define the precise rules for
collating strings, in fact it makes it quite clear that the rules are
implementation-defined.

That would seem to me to be a much more serious code portability issue than
many I have seen discussed at length here and in the comments section of the
latest 2.0 draft.


In real life, there are many different ways
people handle spaces and punctuation when compiling lists in
alphabetical order, and there is no right answer.

Be that as it may, I have seen a fair number of systems that supported
string ordering and/or sorting over my 20 years of programming, including
SPSS, Fortran(66, 77, 90, and 95), C, C++, Objective C, Algol, Pascal,
PostScript, SQL, Awk, Perl, Java, Python, Unix, Linux, Dos, Windows, Mac,
VAX/VMS, VM/CMS, MVS/TSO, IBM JCL, Burroughs JCL, CandE, Cray COS JCL, and
CTSS; this includes string data encoded as ASCII, EBCDIC, ISO Latin, or
Unicode (usually in one UTS guise or another).  Here are some observations
relative to the sorting of strings containing spaces:
    o   Almost every standard or semi-standard string sorting
        library/command/facility I have used sorts strings
        with spaces the way Saxon sorts them.
    o   None of them sort strings with spaces the way Xalan
        sorts them without at least an option setting.
    o   Most would not sort as Xalan sorts without significant
        tinkering with the input.

I am sure there are lots of people here who can trump my experience handily,
but is my expectation that spaces will be significant in string sorting, at
least by default, really *that* parochial?


-- Roger Glover
   glover_roger(_at_)yahoo(_dot_)com


Stan Dyck wrote:
----- snip -----
I get...

<!--Apache Software Foundation--><table>
<tr>
<td>A</td>
</tr>
<tr>
<td>AAL</td>
</tr>
<tr>
<td>Amanda</td>
</tr>
<tr>
<td>A Maureen</td>
</tr>
<tr>
<td>Amy</td>
</tr>
</table>

...using xalan, and (for example)...

<!--SAXON 6.4.1 from Michael Kay--><table>
    <tr>
       <td>A</td>
    </tr>
    <tr>
       <td>A Maureen</td>
    </tr>
    <tr>
       <td>AAL</td>
    </tr>
    <tr>
       <td>Amanda</td>
    </tr>
    <tr>
       <td>Amy</td>
    </tr>
</table>

...using saxon. It looks like xalan is removing the space between "A"
and "Maureen", causing it to sort below Amanda. It looks like a bug to
me, agreed? I couldn't find similar examples on either this list's
archives or on xalan's bug list.



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list