ietf
[Top] [All Lists]

Re: draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

2005-01-06 16:56:38
First, I apologize about the statement "there has been a lot of noise on
this issue". By that, I wasn't really meaning your message in particular. I
was commenting more on the general status of a quite a number of statements
that have been made on the overall topic. And by "noise", I really mean
high-level statements without explicit examples or scenarios, where it is
very hard for people not familiar with the details to be able to judge the
correctness of the statements.

And I will assume that it was that perceived insult that caused you to be
dismissive, with your statement below about "Fine, whatever." I assume that
otherwise you would not so readily conclude that it didn't matter whether
RFC 3066 said "if X then Y" vs. "if Y then X". Those are, after all, very
different statements, and a confusion between them would cause incorrect
conclusions to be drawn.

(c) Every single tag that could be generated under RFC 3066bis is a tag
that
could have been registered under RFC 3066.

True but irrelevant.

Not at all irrelevant. Suppose someone is using a RFC 3066 parser, and is
faced with either:

(a) a registered tag from a future version of the RFC 3066 registry, or
(b) a 3066bis tag (that uses generative features not in RFC 3066).

Their parser will work *exactly* the same way; they would parse both as
being equally well-formed, and they will be unable to determine any of the
structure of either tag, and just treat each as a blob. So they are no
better off, but *no worse off either*. (Had we not followed (c), this would
not have been true.) Of course, if they try parsing a tag that is generated
according to RFC 3066 (eg not in the registry), then they would be able to
parse out the language code and/or country.

If they update to a 3066bis parser, then they can reliably extract much more
information from the tag. And because 3066bis was written to be backwards
compatible, anything RFC 3066 generated language tag parses out exactly the
same as it would with an RFC 3066 parser.

Now you yourself may not care much about the extra information in the
3066bis language tag. But IBM, and many other companies and organizations
do. This is not some theoretically problem; it is a real current issue that
many are faced with. For example, without reliable script information many
languages are severely underspecified. One simply cannot mix content with
different scripts and have happy customers.

And if you don't care about the extra information, you are no worse off than
if you were trying to parse a registered RFC 3066 tag. For matching
purposes, the commonly used truncation mechanism will work just as well with
all 3066bis tags as it does with RFC 3066 tags, for all tags you will
encounter.

‎Mark

----- Original Message ----- 
From: <ned(_dot_)freed(_at_)mrochek(_dot_)com>
To: "Mark Davis" <mark(_dot_)davis(_at_)jtcsv(_dot_)com>
Cc: <ned(_dot_)freed(_at_)mrochek(_dot_)com>; 
<ietf-languages(_at_)alvestrand(_dot_)no>; <ietf(_at_)ietf(_dot_)org>
Sent: Thursday, January 06, 2005 06:44
Subject: Re: draft-phillips-langtags-08, process, sp
ecifications,"stability",and extensions


Rather, the rule is simply that a country code, if present,
always appears as a two letter second subtag. The new draft changes
this
rule,
so applications that pay attention to coutnry codes in language tags
have
to
change and the new algorithm for finding the country code is trickier.

Your text above says (a) "if there is a country code in the tag, it is
the
second subtag". That is not what text of RFC 3066 actually says, which
is:

The following rules apply to the second subtag:
All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...

That is, it says (b) "if a second subtag has 2 letters, then it is an
ISO
3166 code", which is not the same as (a). (It is almost, but not quite,
the
converse.)

Fine, whatever.

The current RFC certainly does not forbid the use of country
codes in other positions in language tags. One could absolutely register
en-Latin-US, for example, meaning English as spoken in the US written in
Latin script.

Sure, but my point was, is, and always has been that any 3066-compliant
implementation won't see this as a country code (unless it is table
driven,
which brings up its own set of issues).

There has been a lot of noise on this issue, and too few concrete
examples.

No, what there has been is a lot of discussion of a real problem with no
apparent recognition of it as such by the draft authors. Your pejorative
characterization of this as "noise" does not make it so.

In the so-called 3066bis draft, we have striven very hard to ensure
that:

(c) Every single tag that could be generated under RFC 3066bis is a tag
that
could have been registered under RFC 3066.

True but irrelevant.

Thus if someone wrote a parser that is future-compatible -- that could
parse
all RFC 3066 language tags including those registered after the parser
was
deployed -- then that parser can handle all 3066bis language tags. This
is a
significant advance over RFC 3066, whose registered (not generated)
language
tags are atomic, and cannot be effectively parsed at all. 3066bis adds
more
structure so as to allow effective parsing of tags.

If you *can* come up with tags that would show that (c) is invalid, that
would be a concrete case that we would have to make adjustments in the
draft
for.

(c) is frankly not an issue I care one whit about. (Perhaps I should, but
I
don't.) I don't register tags. I write code that processes, and more to
the
point matches, tags. That's why I have issues with this draft.

Moreover, all the talk about this being *too* complex is far overblown.

Again, your pejorative dismissal of other people's concerns does not
mean your position is valid.

All
3066bis language tags can be parsed, including all the grandfathered
codes,
with a very short piece of code, or even with a regular expression (such
as
in Perl).

Of course you can write a short piece of code to parse this stuff. It's
what you
do with it after you parse it that's a problem.

This is not rocket science.

Parsing almost never is. But simply parsing these tag is not, and never
has
been, the issue.

Ned



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf