Date: 2004-12-29 17:45
From: "Addison Phillips [wM]" <aphillips(_at_)webmethods(_dot_)com>
To: ietf-languages(_at_)alvestrand(_dot_)no, ietf(_at_)ietf(_dot_)org
Reply to: aphillips(_at_)webmethods(_dot_)com
Comments below. I must admit that I'm losing the ability to respond to this
thread, since it contains direct statements that no response will satisfy the
correspondent.
I'm fairly certain that it does not. It does state that to
date no satisfactory procedural method for handling changes
in meanings of codes has been presented which does not
itself change the meaning of tags which are currently in use.
The origin of the draft is an individual submission, governed by the various
RFCs cited. What's the problem with that? Are individual submissions somehow
inappropriate now?
Individual submissions are fine for Informational and
Experimental RFCs, i.e. RFCs which do not purport to be
or to become standards. Individual submissions can be
part of the Standards Track with AD-level support. It is
possible for an individual submission to become BCP with
the same caveat, however because BCPs go into effect as
standards without the phased roll-in and implementation
experience that characterize the Standards Track, "BCPs
require particular care" (RFC 2026). They should possess
the characteristics that result from phased roll-in of
Standards Track RFCs; design choices resolved, multiple
independent interoperable implementations, they should
be well-understood and have no known technical omissions.
This draft does not modify the process of the IETF [...]
IETF process for use of external standards is to reference
those standards as they exist, not to attempt to modify
those standards by declaring bits and pieces invalid in
the absence of transfer of change control from the
originating body.
Do what you feel is warranted, Bruce. You don't appear to be trying to
achieve consensus, which is the touchstone of the IETF process as I
understand it. If you feel issues should be taken to the IESG, then do so.
You have yourself noted that the draft is an individual
submission, not the result of an IETF process. "consensus"
doesn't apply to an individual effort. IF you want to
adhere to IETF process, by all means ask the IESG to set
up a working group, with a charter, a Chair, etc.; I
fully support that.
This draft defines language tags.
Yes. And a registry format technical specification. And a matching
algorithm technical specification. In addition to the registration
process.
.... just like RFC 3066 did.
RFC 3066 didn't dictate the registry format. The matching
algorithm was much simpler -- indeed the complexity of the
method in the draft under discussion is primarily due to
the addition of orthogonal data as subtags. Note also that
in the transition from 1766 to 3066, the specification of
the Content-Language field was broken out into a separate
document (RFC 3282).
Other drafts, RFCs, specs, etc. define processes and
applications that use them. The appropriate use of language tags
is the concern of those specifications.
Per RFC 2026, an application having specific requirements for use
of Technical Specifications (TS) should provide an Applicability
Statement (AS) specifying specific requirement levels for each
TS involved...
The draft provides specific requirements for language tags themselves, which
are strings compatible with the RFC 3066 strings already used by the other
specifications. The applicability and requirements for this iteration of
language tags is the same as it was under RFC 3066. The language tags created
do not break existing specifications. The requirements in this document were
calibrated to allow all existing RFC 3066 references to remain in force
without prejudice. In fact, we did NOT change things that might have
otherwise been changed in order to ensure deep compatibility.
The point is that an application, such as IDNA, could specify
use of tags at a certain requirement level, matching at a different
requirement level (or using a different algorithm), and is probably
unconcerned with registration procedure and registry format. An
applicability statement for use of language tags for IDNA could
therefore reference the tag format and matching algorithm(s)' TSs
and need not mention the registration procedure or registry format.
In short, I am clarifying your earlier statement about uses of
technical specifications (viz. that an AS is the mechanism by which
appropriate use of TS is documented).
Ultimately, the existance of the RFC 3066 language tag registry trumps all of
your arguments about this: all of the tags defined in the generative
mechanism of RFC 3066bis could have been registered under 3066 (with loss of
functionality for the users of those tags, to be sure). The argument that
every complete tag used anywhere is trumped by the existance of the
generative mechanism in RFC 3066. Registered variant subtags still must have
a recommended range to which they apply. Very little has changed, except that
using subtags is a bit more logical.
I've reread that several times and can't make sense of it. Could you
please rephrase.
If there is some text that this draft should carry to help
guide implementations, please suggest it so that we can all
consider it.
It would help immensely if the 3 technical specifications (tag
format, registry format, matching algorithm) were separated as
separate documents to facilitate reference as independent TSs,
and to facilitate any individual extensions/revisions, etc.
that may be necessary in the future, and to keep those separate
from the registration procedure which itself may need to be
separately referenced and/or revised.
Well there at last is a suggestion. We think splitting the draft up would not
be a benefit because the three items are closely linked and have historically
been in one document. There is no indication that any of these items will be
separately revised in the future. While I'm sure it is possible, I think it
would be wiser to keep these items together, since they have historically
been together.
So why not then also throw in the closely linked specification of
the Content-Language field, which has historically been in the same
document (RFC 1766)? I see no substance in your response; it does
not address the issue of how an implementation of an application
could be facilitated (by making an AS easier to produce by providing
separate documents so that requirement levels can be independently
and clearly specified for the different TSs).
No, the revision clearly expands the scope of language
distinctions that can be represented with a language tag--quite
significantly in some cases.
Indeed, and without registration of the tags and the review process
associated with that (existing RFC 3066) registration procedure. As
Harald Alvestrand pointed out some time ago, that (inappropriately)
shifts implementation effort from the tag generator (no registration
required) to the recipient (what the heck does this mysterious tag
actually *mean*).
Nonesense. There is the same review process (strengthened somewhat, actually,
from experience) for subtags.
RFC 3066 has no review process for subtags. They are what the ISO
lists say they are. It does have a review process for IANA
registered tags as part of that registration procedure, which
(except for private use tags) must be followed before use of a
tag not based on ISO language as a primary tag, and optional
ISO country as a secondary tag.
Harald's point, I think, is not valid because only the registered (and rarely
implemented) tags were subject to scrutiny.
Not so; the ISO language and country codes are certainly subject
to scrutiny (but not to second-guessing and cherry-picking). Under
RFC 3066, a tag may be generated from the standard ISO tag, or it
may be an IANA registered tag (leaving aside private use tags for
the moment). A parser can easily determine what such a tag is; if
the primary subtag has 2 or 3 letters, it is an ISO language code.
If the second subtag has 2 letters, it is an ISO 3166 country code.
Anything else is either private use (primary subtag is x) or is
registered as a complete IANA tag, or is an error. [de-AT-1901,
incidentally, (as an example) does not meet the RFC 3066 requirement
of 3 to 8 characters in the second subtag for registration with
IANA...]. Under the proposed draft, anybody may legally generate
a tag such as
sr-Latn-CS-gaulish-boont-guoyu-i-enochian
or
sr-Latn-CS-gaulish-boont-guoyu-i-enochian-x-foo
with *no* specific registration requirements (i.e. all components
are either registered or require no registration). In the latter
case, a parser can only determine that it contains a private-use
subtag after wading through the other subtags. In either case,
it is difficult (to say the least) for the recipient or his
software to determine what the generator of that tag intended to
convey. Returning to the private use issue; in RFC 3066, as in
every other case that I know of where x is used as an indicator
of private use for some name, it is used as a prefix of the name,
never buried deep inside the name (as provided for by the draft
proposal).
The new draft actually provides a framework in which any subtag's type can be
discerned from its position and size, even if the subtag itself is
unrecognized: this is actually *better* than you could obtain with the
existing registry.
Not quite; in the examples above one cannot determine what "enochian"
is from its size and position alone -- one needs to know that it
follows a single character subtag and that the single character is
not an x.
The generator *is* required to register non-private use subtags for use, so
that statement mystifies me. You can't just use any subtag you feel like
(except as private use). The recipient can access the registry to determine
the meaning of any subtag (you couldn't do that before).
Surely you're not claiming that each individual generator must
separately register "sr", "Latn", "CS" etc. in order to use
them!?! A recipient using software that interprets RFC 3066
tags isn't going to be able to do anything useful with any
hypothetical tag which contains a script subtag that would be
produced under the draft rules (if the script subtag were to appear
*after* the region sugtag, one could at least match "sr-CS-Latn"[...]
to "sr-CS", which an RFC 3066 parser could handle. Again returning
to private-use, an RFC 3066 parser can (only) determine that a
private-use tag is in use if it has x as the primary tag. There
are provisions in the draft syntax that break backwards compatibility.
What about core Internet protocols such as MIME and the
Internet Message format (STD 11)?
I could have cited those. The example was not intended as an exhaustive list,
eh? Are you suggesting that XML isn't an important technology?
[...]
So what? We don't like the W3C or something?
XML isn't an IETF protocol or format. Whether or not it is
"important", for any meaning of that word, is irrelevant. The
point is that given the IETF's limited resources, it
concentrates on Internet technology (see RFC 3935) and it needs
to take (core) Internet protocols into account in IETF
specifications such as RFCs (BCP or otherwise).
Well you can't have it both ways. Either CS means Czechoslovakia or it means
Serbia and Montenegro.
Certainly in language tags "CS" is in use to mean Srbija i
Crna Gora-Srpski. I haven't seen any documented cases where
it is used (in language tags) to mean Czechoslovakia (but I
haven't started any archelogical digs to try to uncover any).
If there has been no such use, then the brouhaha over the change
is much ado about nothing. If there has been such use, then
it's clear that interpretation is going to have to be linked to
time of generation of the tag if the semantics are to be
preserved.
You can see an early version of draft-09 that attempts to address it here:
http://www.inter-locale.com/ID/draft-phillips-langtags-09.html
Your comments on that would be appreciated.
For the moment, we're discussing draft-phillips-langtags-08,
on which IESG action is pending (in a week). There are many
things that the IESG might do when it makes its decision; in
prudence, I'll wait to see what they decide. IMO, discussing
multiple revisions of a draft through multiple IESG New Last
Calls isn't the most efficient or effective way to make
progress.
We greatly expanded what can be represented in four major ways:
1. Added script subtags for writing system variations.
2. Mixed generative and private use subtags for private minor
distinctions in tags.
3. Extensions for really specialized distinctions.
4. UN M49 region codes, including supra-national regions to
represent geographical distinctions not covered by ISO 3166 or by
instability in same.
It's not entirely clear if some of those items (e.g. script) should
be expressed by an orthogonal mechanism rather than embedded in a
*language* tag (for that matter, in retrospect, country codes was
probably a bad idea).
There would be no RFC 1766 or 3066 if ISO 639 language codes actually
captured all of the nuances of language (doh!).
Well, there was a need for separate registered tags and for
specification of private use tags, so I don't think that's quite
right. It sounds like 639-3 might provide substantially greater
coverage.
There is a clear need for script codes for distinguishing certain kinds of
Chinese written material, as well as certain languages in which there are
active script transitions or in which the language is commonly written in
more than one script. Individuals not connected with this effort have
attempted to register similar language tags recently. It is important to
identify the writing system in those cases to many users.
But none of that applies to an audio file of spoken material,
where script would be superfluous and, as noted above, would
lead to loss of backwards compatibility. Surely some types
of script is indicated by the charset; in situations where that
is not the case, a separate mechanism could be used for that
orthogonal parameter without breaking compatibility with
existing parsers of language tags.
The whole "stability" brouhaha seems to be a tempest in a teapot.
Surely the issue could be addressed in a professional manner by
reaching an agreement with ISO/UN regarding the issue, as has been
done for the case of 2-letter vs. 3-letter codes and stability of
existing 3-letter codes.
It is only *one* of the things addressed by the draft. But it is and remains
important. Doug Ewell suggested to me that even if no RA or MA ever reuses a
code again, it is still ISO 3166/MA's job is to keep the codes in sync with
the current state of the world. Whenever countries split up, join together,
or change names, ISO 3166/MA will be there to change the code list. The
instability is not all the MA's fault, but we still need to protect against
it because of legacy data. The lonely CS example should not become the state
of affairs going forwards.
Does the ISO not set ground rules for the 3166/MA? Could it not
specify that codes are not to be reused?
Matching hasn't actually changed.
I beg to differ. Introduction of a script subtag between language
and country code changes matters considerably, in a manner which
breaks backwards compatibility.
The existance of multiple mechanisms isn't really an issue. The draft
specifies ONE mechanism, just like RFC 3066, and notes that more specialized
processing is possible.
It's an issue that calls for a separate specification to facilitate
reference (by an AS) to the mechanism or mechanisms which are
applicable, at their respective requirement levels, without
confusion about what specification is being referenced.
If one specifies "en-FR", then one should not expect to receive
anything less specific than "en-FR".
Are you referring to use in Accept-Language fields or in Content-
Language fields (or equivalent accept/send dichotomy)?
Yes and no. Accept/Content is one example of matching. Another might be a
query on a document (as with XQuery on an XML document, for example). The
remove-from-right matching rules in RFC 3066 (and the draft) have long had
this particular design.
In software resources generally one specifies the *most
specific* (granular) tag that one will accept and may receive
less specific content (which may include the default content).
Indeed; hence the question above. [I also note in passing that
IETF deals with the Internet in particular, not with "software
resources generally".]
So?
Do you not see the contradiction between "one should not expect to
receive anything less specific" vs. "may receive less specific
content"?
Are you not aware of things like message catalogs, resource bundles, and the
like?
I'm aware of many things. But as noted, the IETF has limited
resources, and concentrates on Internet issues; it does not have
delusions of being able to solve all of the world's problems.
In language tag matching one specifies the *least specific* tag
that one will accept and won't receive anything less specific
(although you might receive something more specific).
I'm not sure; if one indicates acceptance of Franglais (en-FR),
receiving plain en is probably acceptable. Receipt of en-FR-<Brittany>
for whatever mechanism is used to indicate the variant of English
spoken in the region of Brittany (where Breton is a Gaelic language,
rather than one derived from Latin, like French, or of Germanic root,
like English) in the country of France, might well be incomprehensible
to an English-speaking Frenchman from Alsace. [Let's not confuse the
specific example with the general principle which it illustrates.]
That's the small point I'm illustrating.
But in response to JFC, you specifically said that "one should not
expect to receive anything less specific". It seems to me that
receipt of less specific (i.e. more general) is OK.
Your example of Breton is a bad choice of tags, though. Breton has its own
ISO 639 code ("bre").
But the tag refers to a dialect of English spoken (as a second
language) by a Breton, not to the Breton language per se (and
in a cursory look, I didn't see a UN M49 region code for
Brittany).
I doubt that en-US-boont is fully intelligible to anyone from more than a few
miles outside Boonville without a dictionary.
Fine, but that isn't representative of the situation that JFC
posed. The representative question would be "does a resident
of Boonville, who speaks en-US-boont, understand en-US?".
Changing the sources for existing subtags or the interpretation
of any particular existing language tag is not permitted if we
are to maintain backwards compatibility.
Agreed that there would be a backwards compatibility problem with
changing the source. Which is why there is an issue with "CS" being
defined in the ISO lists by reference as is currently the case with
RFC 3066, vs. the proposal to change the source to a separate IANA
registry which handles "CS" specially (i.e. differently from many
other ISO-derived codes).
Yawn.
Please see RFC 2026 sections 7.1, 7.1.1, 7.1.3, and 10.1.
Note that RFC 3066 strictly complies with those sections, while
the draft under discussion, by cherry-picking from ISO lists
for which change control has not been transferred to the IESG,
does not.
To be perfectly blunt: we've worked over a year on this
project. If you have specific comments on this draft, with
suggestions for improvements, please send those to the list so
that they can be viewed by the community and so that Mark and I
can address them. Your suggestions for additional changes to the
syntax of language tags we find to be incompatible (to the extent
that we understand them) with RFC 3066 and our own work on
draft-langtags. You will note that draft-langtags can accommodate
your requirements using the mechanisms spelled out above and in
the draft... so I fail to see what we should change. If you can
express that, we'll consider it. Otherwise you are free to do as
we did and write your own draft. Internet-Drafts are a volunteer
effort and do not write themselves. Neither is there a Star
Chamber of people who create them in the dead of night. If you
see a need, fill it. I would suggest: wait for draft-langtags to
be an RFC and write an extension that does what you want.
See RFC 2418; specifically section 2.3 and the comment about consensus
about a wrong design. See also the RFC 2026 process requirements and
RFC 2418 procedures; a group which has no charter or equivalent
document, no written record of meetings, etc. might very well be
described as "a Star Chamber of people".
There is a list archive. You can see the discussion and the drafts (I
maintain all of them online).
That addresses only one of the issues. It does not address the issue
of a charter, of conflict resolution procedures, minutes of face-to-
face meetings, etc. (and the list was established for a purpose other
than work on an RFC).
Discouraging people from participating in the IETF process is, I think,
odious.
Agreed. But the activity on the ietf-languages list regarding the
draft under discussion isn't an IETF process -- there is no WG or
Chair, no charter, etc. Like the fictional Topsy, it jes' growed up.
The current draft REPLACES RFC 3066.
Drafts don't replace RFCs.
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf