xsl-list
[Top] [All Lists]

RE: xsl:number

2003-03-17 02:25:55
Here be dragons.

I agree with you that the specification of numbering sequences is very
weak. In my view it's a classic case of "benign cultural imperialism" -
the spec authors wanted to make it fully international and localisable,
but since they were a bunch of Americans plus the odd expatriate
European, they didn't really have much idea in detail how to go about
it. This situation hasn't really changed in the 2.0 working group, and
the same problem has also made it difficult to agree a spec for
format-date().

As regards the specific questions, I think the result is that
implementors have a pretty free hand to do whatever they think is right.

On collating sequences the group has adopted a different approach: leave
it all to the implementor. This is probably wiser, since implementors
who want to sell their product in a particular geographical market
probably have access to local information about the requirements of that
market. (Well, perhaps this is being optimistic - for years US vendors
produced collating sequences for German which were approved by the
grammar textbooks, but had long since been superseded in popular use:
and contrariwise, Microsoft spell-checkers still tell me that "-ize"
endings are not allowed in the UK, when the OED insists that they
are...).

Michael Kay

-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com 
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com] On Behalf Of 
Mike Brown
Sent: 17 March 2003 05:50
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:number


I have questions about xsl:number. This is the most poorly 
specified instruction I've come across. It's really hard to 
even know what questions to ask.

The way I interpret the XSLT 1.0 spec (and the 2.0 draft 
doesn't help),

  <xsl:number format="A"/>

must be supported, and it must produce something from the sequence

  A, B, C, ..., Z, AA, AB, AC, ...

where A=1, B=2, etc.
 
The way it is specified, it seems to indicate that the 
alphabet must be the English alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

Or perhaps it could be any alphabet that starts with ABC and 
ends with Z, like the Spanish alphabet, which varies 
depending on who you ask, but for computing purposes I think 
is generally ABCDEFGHIJKLMNÑOPQRSTUVWXYZ.

Or perhaps everything after "A" is just an example, meaning 
that it very well could be the Swedish alphabet: 
ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄÖ ... or perhaps Vietnamese, 
which starts with A and has no Z.

Anyway, the implication is that a processor must support some 
alphabet that contains "A". Or is "A" just a placeholder for 
any alphabetic character?

"When numbering with an alphabetic sequence, the lang 
attribute specifies which language's alphabet is to be used; 
it has the same range of values as xml:lang [XML]; if no lang 
value is specified, the language should be determined from 
the system environment."

It seems to me that if format="A", then the value of lang, 
whether determined by the processor or specified in the 
stylesheet, must be a language that contains "A".

What happens if the processor supports both English and 
Hebrew, and I do something like

  <xsl:number format="A" lang="he"/>

? Or for that matter,

  <!-- #1488 = Hebrew letter Aleph -->
  <xsl:number format="&#1488;" lang="en"/>
  
?

What does

  <xsl:number format="B"/>

mean? At the very least, I know "B" must represent 1. If the 
default language is English, does this mean the sequence must be

  B, C, D, ..., Z, BB, BC, BD, ...

?

The spec also says format="I" must be supported by using 
Roman numerals. What does format="I" mean when the language 
is not English?

The spec says "In many languages there are two commonly used 
numbering sequences that use letters. One numbering sequence 
assigns numeric values to letters in alphabetic sequence, and 
the other assigns numeric values to each letter in some other 
manner traditional in that language. In English, these would 
correspond to the numbering sequences specified by the format 
tokens a and i."

This seems to indicate that using "I" for Roman is a 
"traditional" English convention, and (reading further) that 
I could use letter-value="alphabetic" to override this 
interpretation. If my theory about format="B" is correct, 
then format="I" with letter-value="alphabetic" would result 
in I, J, K, ... sequences.

I don't know. I have more questions, but I'll just stop here. 
I really hope this stuff gets cleared up in 2.0, although 
that doesn't help me much in trying to properly implement 1.0.

Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>