ietf
[Top] [All Lists]

RE: draft-phillips-langtags-08, process, specifications, and extensions

2005-01-02 17:53:55
Hi Bruce,

Even if by some oversight or lapse of judgment the tag
"en-US" were to be registered, its interpretation by a
parser would be as an ISO 639 language code followed by
an ISO 3166 country code.  SUch a registration would
therefore be pointless.  In practice, therfore, it
simply wouldn't happen.

I direct you to the sgn-XX registrations. Informative registrations of this 
sort *have* happened.

It would be entirely possible for "en-Latn-US-boont" to be 
registered under the terms of RFC 3066.

But it hasn't been. No RFC 3066 parser will therefore find
that complete tag in its list of IANA registered tags, nor
will it be able to interpret "Latn" as an ISO 3166 2-letter
country code.

RFC 3066 parsers already should not interpret "Latn" as an ISO 3166 region 
code. It isn't two letters long.

As for RFC 3066 parsers being unable to interpret the tag, what do you think 
happens now? New tags are registered all the time and these don't appear in the 
putative list of tags inside extant RFC 3066 parsers. The parsers don't know 
what the tag means, but that doesn't invalidate its use for content in that 
language or by end users, now does it?

For a concrete example, think about "sl-rozaj", just over a year old. None of 
the browsers in my browser collection, not even Firefox, knows what that tag 
means, but all of them accept it and emit it in my Accept-Language header and 
no web sites have complained about it. Okay, I'm not getting any Resian content 
back (but then it isn't first in my A-L list either).

In what sense would any existing RFC 3066 parser (assumed that 
it conforms to RFC 3066) not be able to make any more or less 
sense of that than any other registered tag? 

You're missing the critical factor: it is NOT a registered
tag -- an RFC 3066 parser has no way of recognizing it.

An RFC 3066 parser has no way of recognizing a tag registered after the 
parser's list of tags was created. Therefore RFC 3066 parsers do not, as a 
rule, reject unknown tags. Making sense of a tag is subjective in the case of 
generative tags today in any event. The level of sense required of an RFC 3066 
parser is generally that it be able to use the remove-from-right matching rule 
on ranges and tags until if finds a value it "knows".

There is no reason to create a separate mechanism. When 
identifying textual content,

Language is not exclusively associated with text.  It is also a
characteristic of spoken (sung, etc.) material (but script is
not).

Yes, I agree. Script is important to textual applications of language tags, 
though. The fact that it is not applicable to aural or otherwise signed 
representations of language has nothing to do with whether scripts might need 
to be indicated on content that is written.

Note my use of "or" not "and".  I certainly did not state that the
information could be obtained from charset alone in all cases.

Groping the text is a very poor mechanism for determining the writing system 
used. Your suggestion is that we *should* be *forced* to grope the text. It 
also appears to be your position that we should *not* be given a mechanism 
whereby users can indicate a script preference when selecting language content.

The analogous way to handle that in Internet protocols would be
via Content-Script and Accept-Script where relevant (which they
would not be for audio media).

I think that's an awful idea. Why should users have to set two headers to get 
one result?

Sorry -- saying so doesn't make it so.  I have explained in
detail that an RFC 1766/3066 parser cannot be expected to
make sense of unregistered "sr-Latn-CS" etc.  I have pointed
to specific second subtag length requirements in RFC 3066 for
registration.

Yes, actually it does when the facts fit. Your details are wrong: parsers 
cannot make sense of any tag they don't have information about and this does 
not invalidate the use of said tags. See the sl-rozaj example above. The fact 
that the parser cannot "make sense" of an unregistered tag doesn't have any 
implications for end users as a result. The specific subtag length requirements 
in RFC 3066 you cite are just wrong. Any subtag can be registered, as long as 
it has the requisite length and content restrictions and draft-langtags doesn't 
violate these.

No, a strict RFC 3066 parser will not be able to identify "sr-Latn"
or "sr-Latn-CS" as valid tags.

No, a strict RFC 3066 parser has to have an up-to-the-second list of registered 
tags. Unless you've just written an implementation that foolishly does it, no 
implementations reject unknown tags as long as the tags fit the ABNF 
requirements of RFC 3066. Draft-langtags utilitizes this fact to its advantage 
and actually tidies things up a bit.

Look, Bruce, we're not going to agree and Mark and I are not going to change 
the draft in the manner you appear to be asking for here. We'll see you, I 
suppose, at the end of Last Call.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf