ietf
[Top] [All Lists]

RE: draft-phillips-langtags-08, process, specifications, "stability", and extensions (Was Language Identifier List Comments, updated)

2004-12-29 15:56:11
Comments below. I must admit that I'm losing the ability to respond to this 
thread, since it contains direct statements that no response will satisfy the 
correspondent. 

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Working Group
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

-----Original Message-----
From: ietf-languages-bounces(_at_)alvestrand(_dot_)no
[mailto:ietf-languages-bounces(_at_)alvestrand(_dot_)no]On Behalf Of Bruce 
Lilly
Sent: 2004年12月29日 7:31
To: ietf-languages(_at_)alvestrand(_dot_)no
Cc: ietf(_at_)ietf(_dot_)org
Subject: Re: draft-phillips-langtags-08, process, specifications,
"stability",and extensions (Was Language Identifier List Comments,
updated)


RE: Language Identifier List Comments, updated
 Date: 2004-12-28 18:22
 From: "Addison Phillips [wM]" <aphillips(_at_)webmethods(_dot_)com>
To: "JFC (Jefsey) Morfin" <jefsey(_at_)jefsey(_dot_)com>, "John Cowan" 
<jcowan(_at_)reutershealth(_dot_)com>
 CC: ietf-languages(_at_)alvestrand(_dot_)no

The draft isn't a process draft. Take your process problems to 
the IETF or IESG (or W3C or appropriate standards body).

The draft defines a registration procedure; if it did not do so,
it would probably not be a candidate for BCP (vs. some other type
of RFC).  Aside from the process/procedure that the draft seeks to
establish, there are process/procedure issues having to do with
the origin of the draft, statements about "extensions", and IETF
procedures and mission as specified in RFCs 2026, 2418, and 3935.
And, in accordance with the New Last Call and the procedures
detailed in RFC 2026, the issues are being taken to the IETF/IESG,
however much some participants in the discussion may dislike those
procedures.

The origin of the draft is an individual submission, governed by the various 
RFCs cited. What's the problem with that? Are individual submissions somehow 
inappropriate now?

This draft does not modify the process of the IETF or govern what other I-Ds 
may choose to do except in reference to language tags. As I pointed out in the 
context removed from the above statement: this I-D would be inappropriate if it 
attempted to govern what language tags were used for.

Do what you feel is warranted, Bruce. You don't appear to be trying to achieve 
consensus, which is the touchstone of the IETF process as I understand it. If 
you feel issues should be taken to the IESG, then do so. 

This draft defines language tags.

Yes. And a registry format technical specification.  And a matching
algorithm technical specification. In addition to the registration
process.

.... just like RFC 3066 did.

Other drafts, RFCs, specs, etc. define processes and 
applications that use them. The appropriate use of language tags 
is the concern of those specifications.

Per RFC 2026, an application having specific requirements for use
of Technical Specifications (TS) should provide an Applicability
Statement (AS) specifying specific requirement levels for each
TS involved...

The draft provides specific requirements for language tags themselves, which 
are strings compatible with the RFC 3066 strings already used by the other 
specifications. The applicability and requirements for this iteration of 
language tags is the same as it was under RFC 3066. The language tags created 
do not break existing specifications. The requirements in this document were 
calibrated to allow all existing RFC 3066 references to remain in force without 
prejudice. In fact, we did NOT change things that might have otherwise been 
changed in order to ensure deep compatibility.

Ultimately, the existance of the RFC 3066 language tag registry trumps all of 
your arguments about this: all of the tags defined in the generative mechanism 
of RFC 3066bis could have been registered under 3066 (with loss of 
functionality for the users of those tags, to be sure). The argument that every 
complete tag used anywhere is trumped by the existance of the generative 
mechanism in RFC 3066. Registered variant subtags still must have a recommended 
range to which they apply. Very little has changed, except that using subtags 
is a bit more logical.


If there is some text that this draft should carry to help 
guide implementations, please suggest it so that we can all 
consider it.   

It would help immensely if the 3 technical specifications (tag
format, registry format, matching algorithm) were separated as
separate documents to facilitate reference as independent TSs,
and to facilitate any individual extensions/revisions, etc.
that may be necessary in the future, and to keep those separate
from the registration procedure which itself may need to be
separately referenced and/or revised.

Well there at last is a suggestion. We think splitting the draft up would not 
be a benefit because the three items are closely linked and have historically 
been in one document. There is no indication that any of these items will be 
separately revised in the future. While I'm sure it is possible, I think it 
would be wiser to keep these items together, since they have historically been 
together.

No, the revision clearly expands the scope of language 
distinctions that can be represented with a language tag--quite 
significantly in some cases.

Indeed, and without registration of the tags and the review process
associated with that (existing RFC 3066) registration procedure. As
Harald Alvestrand pointed out some time ago, that (inappropriately)
shifts implementation effort from the tag generator (no registration
required) to the recipient (what the heck does this mysterious tag
actually *mean*).

Nonesense. There is the same review process (strengthened somewhat, actually, 
from experience) for subtags. Harald's point, I think, is not valid because 
only the registered (and rarely implemented) tags were subject to scrutiny. The 
new draft actually provides a framework in which any subtag's type can be 
discerned from its position and size, even if the subtag itself is 
unrecognized: this is actually *better* than you could obtain with the existing 
registry. 

The generator *is* required to register non-private use subtags for use, so 
that statement mystifies me. You can't just use any subtag you feel like 
(except as private use). The recipient can access the registry to determine the 
meaning of any subtag (you couldn't do that before).

But its grammar is much more restrictive, in part to ensure 
full backwards compatibility with tiny little applications like, 
oh, say XML.

It may have been intended to have been more restrictive, but it
needs work to achieve that goal (as previously discussed in
detail).

Dealt with in the pending draft-09 as previously discussed.

XM who?  What about core Internet protocols such as MIME and the
Internet Message format (STD 11)? 

I could have cited those. The example was not intended as an exhaustive list, 
eh? Are you suggesting that XML isn't an important technology?

 I believe XML is a w3 consortium
product, not an IETF product.

So what? We don't like the W3C or something? 

It also restricts future development of compatible language 
tags in an effort to ensure that implementations of 
draft-langtags are stable over time and extended in a controlled manner.  

I still believe there is a problem with the proposed method of
handling "CS", which is destabilizing (given previously documented
use of "sr-CS" vs. the demise of Czechoslovakia prior to use of
country codes in language tags (RFC 1766)).  I have yet to see a
detailed concrete proposal for a general procedure that would
ensure stability of the current meaning of "CS" embodied in a
general principle as part of the registration procedure. [N.B.
making a special-case exception for "CS" doesn't address the issue.]

Well you can't have it both ways. Either CS means Czechoslovakia or it means 
Serbia and Montenegro.

You can see an early version of draft-09 that attempts to address it here:

http://www.inter-locale.com/ID/draft-phillips-langtags-09.html

Your comments on that would be appreciated.

We greatly expanded what can be represented in four major ways:

1. Added script subtags for writing system variations.
2. Mixed generative and private use subtags for private minor 
distinctions in tags.
3. Extensions for really specialized distinctions.
4. UN M49 region codes, including supra-national regions to 
represent geographical distinctions not covered by ISO 3166 or by 
instability in same.

It's not entirely clear if some of those items (e.g. script) should
be expressed by an orthogonal mechanism rather than embedded in a
*language* tag (for that matter, in retrospect, country codes was
probably a bad idea).

There would be no RFC 1766 or 3066 if ISO 639 language codes actually captured 
all of the nuances of language (doh!). There is a clear need for script codes 
for distinguishing certain kinds of Chinese written material, as well as 
certain languages in which there are active script transitions or in which the 
language is commonly written in more than one script. Individuals not connected 
with this effort have attempted to register similar language tags recently. It 
is important to identify the writing system in those cases to many users.

The whole "stability" brouhaha seems to be a tempest in a teapot.
Surely the issue could be addressed in a professional manner by
reaching an agreement with ISO/UN regarding the issue, as has been
done for the case of 2-letter vs. 3-letter codes and stability of
existing 3-letter codes.

It is only *one* of the things addressed by the draft. But it is and remains 
important. Doug Ewell suggested to me that even if no RA or MA ever reuses a 
code again, it is still ISO 3166/MA's job is to keep the codes in sync with the 
current state of the world.  Whenever countries split up, join together, or 
change names, ISO 3166/MA will be there to change the code list.  The 
instability is not all the MA's fault, but we still need to protect against it 
because of legacy data. The lonely CS example should not become the state of 
affairs going forwards.

This is dealt with in Section 2.4.2 "Matching". This section 
clearly details the fallback mechanism (which is compatible with 
the one in RFC 3066), as well as some considerations for 
additional matching that can be done by specialized processors 
that implement a different mechanism. The matching algorithm is 
the standard one, but is not mandatory. In fact, I have a paper 
with Jeremy Carroll on a different matching algorithm that an OWL 
implementation might use. Read this section of the draft carefully.

I note that Frank Ellerman has raised some issues, but as yet I
haven't seen any response.  The existence of multiple mechanisms,
coupled with issues regarding the one proposed in the draft, is
a strong indication that the matching algorithm should be split
into a separate document (possibly as one of multiple Experimental
RFCs, or as a Standards Track or Informational RFC).

Matching hasn't actually changed. Frank has raised some good issues: I believe 
I responded to his message.

The existance of multiple mechanisms isn't really an issue. The draft specifies 
ONE mechanism, just like RFC 3066, and notes that more specialized processing 
is possible. This isn't actually different than what RFC 3066 did in actual 
effect. We purposely did not specify experimental matching algorithms.

If one specifies "en-FR", then one should not expect to receive 
anything less specific than "en-FR".

Are you referring to use in Accept-Language fields or in Content-
Language fields (or equivalent accept/send dichotomy)?

Yes and no. Accept/Content is one example of matching. Another might be a query 
on a document (as with XQuery on an XML document, for example). The 
remove-from-right matching rules in RFC 3066 (and the draft) have long had this 
particular design.

In software resources generally one specifies the *most 
specific* (granular) tag that one will accept and may receive 
less specific content (which may include the default content).

Indeed; hence the question above. [I also note in passing that
IETF deals with the Internet in particular, not with "software
resources generally".]

So? Are you not aware of things like message catalogs, resource bundles, and 
the like? I give an example to illustrate a small point.

In language tag matching one specifies the *least specific* tag 
that one will accept and won't receive anything less specific 
(although you might receive something more specific). 

I'm not sure; if one indicates acceptance of Franglais (en-FR),
receiving plain en is probably acceptable.  Receipt of en-FR-<Brittany>
for whatever mechanism is used to indicate the variant of English
spoken in the region of Brittany (where Breton is a Gaelic language,
rather than one derived from Latin, like French, or of Germanic root,
like English) in the country of France, might well be incomprehensible
to an English-speaking Frenchman from Alsace. [Let's not confuse the
specific example with the general principle which it illustrates.]

That's the small point I'm illustrating. The draft is very clear about the 
falsehood of assuming that a more specific tag is mutually intelligible with a 
less specific one. Your example of Breton is a bad choice of tags, though. 
Breton has its own ISO 639 code ("bre"). Let's make it better:

I doubt that en-US-boont is fully intelligible to anyone from more than a few 
miles outside Boonville without a dictionary.

The language tag syntax from RFC 3066 itself cannot be changed. 
draft-langtags carefully adds restrictions to the ABNF and 
grammar of the tags to ensure that this is so.

Again, the implementation falls short of the promise.

I grow impatient.

Changing the sources for existing subtags or the interpretation 
of any particular existing language tag is not permitted if we 
are to maintain backwards compatibility.

Agreed that there would be a backwards compatibility problem with
changing the source.  Which is why there is an issue with "CS" being
defined in the ISO lists by reference as is currently the case with
RFC 3066, vs. the proposal to change the source to a separate IANA
registry which handles "CS" specially (i.e. differently from many
other ISO-derived codes).

Yawn. We have modified draft-09 in an attempt to deal with this issue, but 
either way we need to deal with 'CS'.
  
To be perfectly blunt: we've worked over a year on this 
project. If you have specific comments on this draft, with 
suggestions for improvements, please send those to the list so 
that they can be viewed by the community and so that Mark and I 
can address them. Your suggestions for additional changes to the 
syntax of language tags we find to be incompatible (to the extent 
that we understand them) with RFC 3066 and our own work on 
draft-langtags. You will note that draft-langtags can accommodate 
your requirements using the mechanisms spelled out above and in 
the draft... so I fail to see what we should change. If you can 
express that, we'll consider it. Otherwise you are free to do as 
we did and write your own draft. Internet-Drafts are a volunteer 
effort and do not write themselves. Neither is there a Star 
Chamber of people who create them in the dead of night. If you 
see a need, fill it. I would suggest: wait for draft-langtags to 
be an RFC and write an extension that does what you want.

See RFC 2418; specifically section 2.3 and the comment about consensus
about a wrong design.  See also the RFC 2026 process requirements and
RFC 2418 procedures; a group which has no charter or equivalent
document, no written record of meetings, etc. might very well be
described as "a Star Chamber of people".

There is a list archive. You can see the discussion and the drafts (I maintain 
all of them online). Discouraging people from participating in the IETF process 
is, I think, odious.

One doesn't write "extensions" to BCP RFCs (that's one of the problems
with the agglomeration of specifications in the current document); a
BCP is replaced wholesale (although in theory it might be possible to
have two related BCPs coexist per the details in RFC 2026 section 6.3;
but that is unlikely, and in any event the current draft does not
contain the sort of statement required to coexist with RFC 3066).

Read the draft. The word extension is defined there with a specific meaning. I 
use that meaning above.

The current draft REPLACES RFC 3066. In it there is text that allows for 
separate RFCs that provide specific extensions (so that the need to revise this 
document in the future is reduced, contributing to, well, stability of language 
tags).



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf