RE: New Last Call: 'Tags for Identifying Languages' to BCP

From: ietf-languages-bounces(_at_)alvestrand(_dot_)no [mailto:ietf-languages-
bounces(_at_)alvestrand(_dot_)no] On Behalf Of Bruce Lilly

That is not at all the aim here wrt stability; rather, the aim is

that a

symbolic identifier used for metadata in IT systems not change

because

some government on a whim says, "We would now prefer to use 'yz'

rather

than 'xy' to designate our country."


If by international agreement, 'yz' becomes the designation
for that country, then it is rather silly to stick one's
fingers in one's ears and shout "NA-NA-NA-NA-NA I don't want
to hear you".


That misses the point entirely. The point is that IDs used by political
administrations may change for any number of reasons, and those
admministrations may have no qualms with such changes; but in IT
systems, we cannot afford changes that break existing implementations
and data. If for whatever reason ISO and the UN decided that "US" should
be used to designate the country of France, I doubt you'd expect every
software vendor to update all of their deployed installations to use
"fr-US" instead of "fr-FR", and for every user to go through every data
repository they manage to make such changes in their data.

The people that maintain time zone definitions may have their means for
changing times; that's fine for them. They are not dealing with the same
concerns as we are dealing with. The group here that has focused
specifically on language-tagging issues for several years has evaluated
issues that affect language tags and the impact of changes and has
decided what is best practice for *this* domain, and it is to maintain
stability of data rather than cater to whims of political
administrations.

"Designed" or not, country codes *are* read by humans; they
appear in top-level domain names.  Currently the ISO 639
2-letter codes mean the same thing as the last component of
a domain name


I think you mean ISO 3166 2-letter codes.

and as the second component of a language-tag.
It's rather silly to change that correspondence simply because
a few people are piqued that international agreement has been
reached to change a few 2-letter codes.


The usability flaw in treating ISO 639 and ISO 3166 as human-readable is
evident in the confusion between ja and JP (or is it jp and JA?), and GB
vs UK. As for what is silly, if the UN country ID for Canada changed to
CN (and that for PRC changed to something else), I'm sure it would cause
far greater problems for users to have to change the last two letters in
domain names than for them to keep doing what they always did. In fact,
I would have thought it would create a rather significant problem on the
Internet if such a change were made. (URIs don't come with versioning
dates for domain names, so how would a DNS server know what the "cn"
meant?)

Neither RFC 1766 or RFC 3066 has ever presented "official"

translations;


Both defer to the ISO lists for definitions (not "translations")
of the various codes.


Definitions; not language names for display use.

this is no different for RFC 3066bis.


It is very different; under the proposed draft, there is only
an English definition, somebody wishing to provide a French
definition finds that he has none and must resort to an
unofficial translation.


The more you press this, the more silly it seems. RFC 3066 does not
anywhere discuss display names; localization data is beyond its scope.
The registry it defines does not give provision for French language
names. The source ISO standards are every bit as accessible as they ever
were, and just as RFC 3066 gave the user no option but to refer to the
source ISO standard, so users should and can continue to do so.

After this response, I will not waste my time any further on this
foolishness.

I'm willing to postpone the discussion
(other problems with the proposed registry format dictate
a broader solution which could easily have provision for
an arbitrary number of descriptions).


I strongly object to the suggestion that progress on this draft be
delayed to deal with this non issue that caters to implementation issues
that are well beyond the scope of either RFC 3066 or its proposed
replacement.

No, you are overlooking the fact that a set of codes with
no corresponding definitions is useless.  RFC 3066 defers
the code/definition pairs to ISO, which provides multilingual
definitions. The proposed draft would remove that multilingual
characteristic.


What if the registry provide no name, just the ID? Then people would
have to refer to the source ISO standard as they did in the past, and we
would be able specify which ISO IDs were or were not valid. That would
achieve the goal that we had wrt stability while eliminating the concern
that English-only annotations for some reason apparently create for you.
Personally, I think the English annotation is helpful, but it seems that
the real solution you're looking for is to remove any annotation
whatsoever so that the situation is closer to what we have under RFC
3066.

Display names for languages and countries are not within the scope

of

RFC 1766 or RFC 3066. It is preposterous to suggest that this draft

is

not compatible with existing implementations of RFC 3066 on that

basis.


On the contrary, it is preposterous to suggest that codes
will be attached to text by magic; some human somewhere,
somehow is going to have to indicate the language to
something, and it certainly isn't going to be by way of
a 2- or 3-letter code without some reference to what those
codes *mean*.  And at the present time, the meaning of
those codes is defined -- bilingually -- in the ISO
lists.


RFC 3066 did not even discuss let alone provide a means for attaching
display text to codes. It *is* preposterous to suggest that this draft
is incompatible with RFC 3066 on that basis. Again, the more you press
this, the more silly it seems.

But
you are simply adding localization requirements to a spec for i18n
infrastructure, and I consider that not at all appropriate.


No, I am complaining about removal of internationalized
definitions associated with language tag components.


No definitions are removed. The draft points to the source ISO standards
just as RFC 3066 does.

"Localization" would be translation of the French definition
into some other language.  That is not my concern. My concern
is the elimination of the French definition in the first place.


No, you have not commented on definitions; you have repeatedly commented
on stings to present to users. Please accept that your arguments on this
matter are empty.

One part of my claim is that non-private-use RFC 3066 tags
up to the present time are no longer than 11 octets in length.


Only co-incidently at the present time.


As mentioned, under RFC 1766/3066 review/registration rules,
excessively long tags would certainly raise objections. That's
no coincidence -- it's an intentional design feature.


But excessive is not defined anywhere in RFC 1766/3066, and if there was
a very good reason presented why a tag of x characters long were needed,
it would have to be considered.

And so that limit would be a constraint applying for all time to the
'grandfathered' production which concerned you so much.


And so it can easily be incorporated into that ABNF production.


The productive thing would be for you to provide a suggested revision of
the ABNF to the authors.



Peter Constable
Microsoft Corporation

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf