Re: draft-phillips-langtags-08, process, specifications, "stability", a

 Date: 2004-12-29 17:45
 From: "Addison Phillips [wM]" <aphillips(_at_)webmethods(_dot_)com>
 To: ietf-languages(_at_)alvestrand(_dot_)no, ietf(_at_)ietf(_dot_)org
 Reply to: aphillips(_at_)webmethods(_dot_)com

Comments below. I must admit that I'm losing the ability to respond to this 
thread, since it contains direct statements that no response will satisfy the 
correspondent.


I'm fairly certain that it does not. It does state that to
date no satisfactory procedural method for handling changes
in meanings of codes has been presented which does not
itself change the meaning of tags which are currently in use.

The origin of the draft is an individual submission, governed by the various 
RFCs cited. What's the problem with that? Are individual submissions somehow 
inappropriate now?


Individual submissions are fine for Informational and
Experimental RFCs, i.e. RFCs which do not purport to be
or to become standards.  Individual submissions can be
part of the Standards Track with AD-level support. It is
possible for an individual submission to become BCP with
the same caveat, however because BCPs go into effect as
standards without the phased roll-in and implementation
experience that characterize the Standards Track, "BCPs
require particular care" (RFC 2026).  They should possess
the characteristics that result from phased roll-in of
Standards Track RFCs; design choices resolved, multiple
independent interoperable implementations, they should
be well-understood and have no known technical omissions.

This draft does not modify the process of the IETF [...]


IETF process for use of external standards is to reference
those standards as they exist, not to attempt to modify
those standards by declaring bits and pieces invalid in
the absence of transfer of change control from the
originating body.

Do what you feel is warranted, Bruce. You don't appear to be trying to 
achieve consensus, which is the touchstone of the IETF process as I 
understand it. If you feel issues should be taken to the IESG, then do so.


You have yourself noted that the draft is an individual
submission, not the result of an IETF process. "consensus"
doesn't apply to an individual effort.  IF you want to
adhere to IETF process, by all means ask the IESG to set
up a working group, with a charter, a Chair, etc.; I
fully support that.

This draft defines language tags.


Yes. And a registry format technical specification.  And a matching
algorithm technical specification. In addition to the registration
process.


.... just like RFC 3066 did.


RFC 3066 didn't dictate the registry format. The matching
algorithm was much simpler -- indeed the complexity of the
method in the draft under discussion is primarily due to
the addition of orthogonal data as subtags.  Note also that
in the transition from 1766 to 3066, the specification of
the Content-Language field was broken out into a separate
document (RFC 3282).

Other drafts, RFCs, specs, etc. define processes and

applications that use them. The appropriate use of language tags 
is the concern of those specifications.

Per RFC 2026, an application having specific requirements for use
of Technical Specifications (TS) should provide an Applicability
Statement (AS) specifying specific requirement levels for each
TS involved...


The draft provides specific requirements for language tags themselves, which 
are strings compatible with the RFC 3066 strings already used by the other 
specifications. The applicability and requirements for this iteration of 
language tags is the same as it was under RFC 3066. The language tags created 
do not break existing specifications. The requirements in this document were 
calibrated to allow all existing RFC 3066 references to remain in force 
without prejudice. In fact, we did NOT change things that might have 
otherwise been changed in order to ensure deep compatibility.


The point is that an application, such as IDNA, could specify
use of tags at a certain requirement level, matching at a different
requirement level (or using a different algorithm), and is probably
unconcerned with registration procedure and registry format. An
applicability statement for use of language tags for IDNA could
therefore reference the tag format and matching algorithm(s)' TSs
and need not mention the registration procedure or registry format.
In short, I am clarifying your earlier statement about uses of
technical specifications (viz. that an AS is the mechanism by which
appropriate use of TS is documented).

Ultimately, the existance of the RFC 3066 language tag registry trumps all of 
your arguments about this: all of the tags defined in the generative 
mechanism of RFC 3066bis could have been registered under 3066 (with loss of 
functionality for the users of those tags, to be sure). The argument that 
every complete tag used anywhere is trumped by the existance of the 
generative mechanism in RFC 3066. Registered variant subtags still must have 
a recommended range to which they apply. Very little has changed, except that 
using subtags is a bit more logical.


I've reread that several times and can't make sense of it. Could you
please rephrase.

If there is some text that this draft should carry to help

guide implementations, please suggest it so that we can all 
consider it.   

It would help immensely if the 3 technical specifications (tag
format, registry format, matching algorithm) were separated as
separate documents to facilitate reference as independent TSs,
and to facilitate any individual extensions/revisions, etc.
that may be necessary in the future, and to keep those separate
from the registration procedure which itself may need to be
separately referenced and/or revised.


Well there at last is a suggestion. We think splitting the draft up would not 
be a benefit because the three items are closely linked and have historically 
been in one document. There is no indication that any of these items will be 
separately revised in the future. While I'm sure it is possible, I think it 
would be wiser to keep these items together, since they have historically 
been together.


So why not then also throw in the closely linked specification of
the Content-Language field, which has historically been in the same
document (RFC 1766)?  I see no substance in your response; it does
not address the issue of how an implementation of an application
could be facilitated (by making an AS easier to produce by providing
separate documents so that requirement levels can be independently
and clearly specified for the different TSs).

No, the revision clearly expands the scope of language

distinctions that can be represented with a language tag--quite 
significantly in some cases.

Indeed, and without registration of the tags and the review process
associated with that (existing RFC 3066) registration procedure. As
Harald Alvestrand pointed out some time ago, that (inappropriately)
shifts implementation effort from the tag generator (no registration
required) to the recipient (what the heck does this mysterious tag
actually *mean*).


Nonesense. There is the same review process (strengthened somewhat, actually, 
from experience) for subtags.


RFC 3066 has no review process for subtags. They are what the ISO
lists say they are. It does have a review process for IANA
registered tags as part of that registration procedure, which
(except for private use tags) must be followed before use of a
tag not based on ISO language as a primary tag, and optional
ISO country as a secondary tag.

Harald's point, I think, is not valid because only the registered (and rarely 
implemented) tags were subject to scrutiny.


Not so; the ISO language and country codes are certainly subject
to scrutiny (but not to second-guessing and cherry-picking). Under
RFC 3066, a tag may be generated from the standard ISO tag, or it
may be an IANA registered tag (leaving aside private use tags for
the moment).  A parser can easily determine what such a tag is; if
the primary subtag has 2 or 3 letters, it is an ISO language code.
If the second subtag has 2 letters, it is an ISO 3166 country code.
Anything else is either private use (primary subtag is x) or is
registered as a complete IANA tag, or is an error. [de-AT-1901,
incidentally, (as an example) does not meet the RFC 3066 requirement
of 3 to 8 characters in the second subtag for registration with
IANA...].  Under the proposed draft, anybody may legally generate
a tag such as
  sr-Latn-CS-gaulish-boont-guoyu-i-enochian
or
  sr-Latn-CS-gaulish-boont-guoyu-i-enochian-x-foo
with *no* specific registration requirements (i.e. all components
are either registered or require no registration). In the latter
case, a parser can only determine that it contains a private-use
subtag after wading through the other subtags.  In either case,
it is difficult (to say the least) for the recipient or his
software to determine what the generator of that tag intended to
convey.  Returning to the private use issue; in RFC 3066, as in
every other case that I know of where x is used as an indicator
of private use for some name, it is used as a prefix of the name,
never buried deep inside the name (as provided for by the draft
proposal).

The new draft actually provides a framework in which any subtag's type can be 
discerned from its position and size, even if the subtag itself is 
unrecognized: this is actually *better* than you could obtain with the 
existing registry.


Not quite; in the examples above one cannot determine what "enochian"
is from its size and position alone -- one needs to know that it
follows a single character subtag and that the single character is
not an x.

The generator *is* required to register non-private use subtags for use, so 
that statement mystifies me. You can't just use any subtag you feel like 
(except as private use). The recipient can access the registry to determine 
the meaning of any subtag (you couldn't do that before).


Surely you're not claiming that each individual generator must
separately register "sr", "Latn", "CS" etc. in order to use
them!?!  A recipient using software that interprets RFC 3066
tags isn't going to be able to do anything useful with any
hypothetical tag which contains a script subtag that would be
produced under the draft rules (if the script subtag were to appear
*after* the region sugtag, one could at least match "sr-CS-Latn"[...]
to "sr-CS", which an RFC 3066 parser could handle. Again returning
to private-use, an RFC 3066 parser can (only) determine that a
private-use tag is in use if it has x as the primary tag. There
are provisions in the draft syntax that break backwards compatibility.

What about core Internet protocols such as MIME and the
Internet Message format (STD 11)?


I could have cited those. The example was not intended as an exhaustive list, 
eh? Are you suggesting that XML isn't an important technology?

[...]

So what? We don't like the W3C or something?


XML isn't an IETF protocol or format. Whether or not it is
"important", for any meaning of that word, is irrelevant. The
point is that given the IETF's limited resources, it
concentrates on Internet technology (see RFC 3935) and it needs
to take (core) Internet protocols into account in IETF
specifications such as RFCs (BCP or otherwise).

Well you can't have it both ways. Either CS means Czechoslovakia or it means 
Serbia and Montenegro.


Certainly in language tags "CS" is in use to mean Srbija i
Crna Gora-Srpski.  I haven't seen any documented cases where
it is used (in language tags) to mean Czechoslovakia (but I
haven't started any archelogical digs to try to uncover any).
If there has been no such use, then the brouhaha over the change
is much ado about nothing.  If there has been such use, then
it's clear that interpretation is going to have to be linked to
time of generation of the tag if the semantics are to be
preserved.

You can see an early version of draft-09 that attempts to address it here:

http://www.inter-locale.com/ID/draft-phillips-langtags-09.html

Your comments on that would be appreciated.


For the moment, we're discussing draft-phillips-langtags-08,
on which IESG action is pending (in a week).  There are many
things that the IESG might do when it makes its decision; in
prudence, I'll wait to see what they decide.  IMO, discussing
multiple revisions of a draft through multiple IESG New Last
Calls isn't the most efficient or effective way to make
progress.

We greatly expanded what can be represented in four major ways:

1. Added script subtags for writing system variations.
2. Mixed generative and private use subtags for private minor

distinctions in tags.

3. Extensions for really specialized distinctions.
4. UN M49 region codes, including supra-national regions to

represent geographical distinctions not covered by ISO 3166 or by 
instability in same.

It's not entirely clear if some of those items (e.g. script) should
be expressed by an orthogonal mechanism rather than embedded in a
*language* tag (for that matter, in retrospect, country codes was
probably a bad idea).


There would be no RFC 1766 or 3066 if ISO 639 language codes actually 
captured all of the nuances of language (doh!).


Well, there was a need for separate registered tags and for
specification of private use tags, so I don't think that's quite
right. It sounds like 639-3 might provide substantially greater
coverage.

There is a clear need for script codes for distinguishing certain kinds of 
Chinese written material, as well as certain languages in which there are 
active script transitions or in which the language is commonly written in 
more than one script. Individuals not connected with this effort have 
attempted to register similar language tags recently. It is important to 
identify the writing system in those cases to many users.


But none of that applies to an audio file of spoken material,
where script would be superfluous and, as noted above, would
lead to loss of backwards compatibility.  Surely some types
of script is indicated by the charset; in situations where that
is not the case, a separate mechanism could be used for that
orthogonal parameter without breaking compatibility with
existing parsers of language tags.

The whole "stability" brouhaha seems to be a tempest in a teapot.
Surely the issue could be addressed in a professional manner by
reaching an agreement with ISO/UN regarding the issue, as has been
done for the case of 2-letter vs. 3-letter codes and stability of
existing 3-letter codes.


It is only *one* of the things addressed by the draft. But it is and remains 
important. Doug Ewell suggested to me that even if no RA or MA ever reuses a 
code again, it is still ISO 3166/MA's job is to keep the codes in sync with 
the current state of the world.  Whenever countries split up, join together, 
or change names, ISO 3166/MA will be there to change the code list.  The 
instability is not all the MA's fault, but we still need to protect against 
it because of legacy data. The lonely CS example should not become the state 
of affairs going forwards.


Does the ISO not set ground rules for the 3166/MA?  Could it not
specify that codes are not to be reused?

Matching hasn't actually changed.


I beg to differ. Introduction of a script subtag between language
and country code changes matters considerably, in a manner which
breaks backwards compatibility.

The existance of multiple mechanisms isn't really an issue. The draft 
specifies ONE mechanism, just like RFC 3066, and notes that more specialized 
processing is possible.


It's an issue that calls for a separate specification to facilitate
reference (by an AS) to the mechanism or mechanisms which are
applicable, at their respective requirement levels, without
confusion about what specification is being referenced.

If one specifies "en-FR", then one should not expect to receive

anything less specific than "en-FR".

Are you referring to use in Accept-Language fields or in Content-
Language fields (or equivalent accept/send dichotomy)?


Yes and no. Accept/Content is one example of matching. Another might be a 
query on a document (as with XQuery on an XML document, for example). The 
remove-from-right matching rules in RFC 3066 (and the draft) have long had 
this particular design.

In software resources generally one specifies the *most

specific* (granular) tag that one will accept and may receive 
less specific content (which may include the default content).

Indeed; hence the question above. [I also note in passing that
IETF deals with the Internet in particular, not with "software
resources generally".]

So?


Do you not see the contradiction between "one should not expect to
receive anything less specific" vs. "may receive less specific
content"?

Are you not aware of things like message catalogs, resource bundles, and the 
like?


I'm aware of many things. But as noted, the IETF has limited
resources, and concentrates on Internet issues; it does not have
delusions of being able to solve all of the world's problems.

In language tag matching one specifies the *least specific* tag

that one will accept and won't receive anything less specific 
(although you might receive something more specific). 

I'm not sure; if one indicates acceptance of Franglais (en-FR),
receiving plain en is probably acceptable.  Receipt of en-FR-<Brittany>
for whatever mechanism is used to indicate the variant of English
spoken in the region of Brittany (where Breton is a Gaelic language,
rather than one derived from Latin, like French, or of Germanic root,
like English) in the country of France, might well be incomprehensible
to an English-speaking Frenchman from Alsace. [Let's not confuse the
specific example with the general principle which it illustrates.]


That's the small point I'm illustrating.


But in response to JFC, you specifically said that "one should not
expect to receive anything less specific". It seems to me that
receipt of less specific (i.e. more general) is OK.

Your example of Breton is a bad choice of tags, though. Breton has its own 
ISO 639 code ("bre").


But the tag refers to a dialect of English spoken (as a second
language) by a Breton, not to the Breton language per se (and
in a cursory look, I didn't see a UN M49 region code for
Brittany).

I doubt that en-US-boont is fully intelligible to anyone from more than a few 
miles outside Boonville without a dictionary.


Fine, but that isn't representative of the situation that JFC
posed.  The representative question would be "does a resident
of Boonville, who speaks en-US-boont, understand en-US?".

Changing the sources for existing subtags or the interpretation

of any particular existing language tag is not permitted if we 
are to maintain backwards compatibility.

Agreed that there would be a backwards compatibility problem with
changing the source.  Which is why there is an issue with "CS" being
defined in the ISO lists by reference as is currently the case with
RFC 3066, vs. the proposal to change the source to a separate IANA
registry which handles "CS" specially (i.e. differently from many
other ISO-derived codes).


Yawn.


Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1.
Note that RFC 3066 strictly complies with those sections, while
the draft under discussion, by cherry-picking from ISO lists
for which change control has not been transferred to the IESG,
does not.

To be perfectly blunt: we've worked over a year on this

project. If you have specific comments on this draft, with 
suggestions for improvements, please send those to the list so 
that they can be viewed by the community and so that Mark and I 
can address them. Your suggestions for additional changes to the 
syntax of language tags we find to be incompatible (to the extent 
that we understand them) with RFC 3066 and our own work on 
draft-langtags. You will note that draft-langtags can accommodate 
your requirements using the mechanisms spelled out above and in 
the draft... so I fail to see what we should change. If you can 
express that, we'll consider it. Otherwise you are free to do as 
we did and write your own draft. Internet-Drafts are a volunteer 
effort and do not write themselves. Neither is there a Star 
Chamber of people who create them in the dead of night. If you 
see a need, fill it. I would suggest: wait for draft-langtags to 
be an RFC and write an extension that does what you want.

See RFC 2418; specifically section 2.3 and the comment about consensus
about a wrong design.  See also the RFC 2026 process requirements and
RFC 2418 procedures; a group which has no charter or equivalent
document, no written record of meetings, etc. might very well be
described as "a Star Chamber of people".


There is a list archive. You can see the discussion and the drafts (I 
maintain all of them online).


That addresses only one of the issues. It does not address the issue
of a charter, of conflict resolution procedures, minutes of face-to-
face meetings, etc. (and the list was established for a purpose other
than work on an RFC).

Discouraging people from participating in the IETF process is, I think, 
odious.


Agreed.  But the activity on the ietf-languages list regarding the
draft under discussion isn't an IETF process -- there is no WG or
Chair, no charter, etc.  Like the fictional Topsy, it jes' growed up.

The current draft REPLACES RFC 3066.


Drafts don't replace RFCs.



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

Re: draft-phillips-langtags-08, process, specifications, "stability", and extensions