Re: New Last Call: 'Tags for Identifying Languages' to BCP

On Thu December 9 2004 12:23, ietf-announce-request(_at_)ietf(_dot_)org wrote:

New Last Call: 'Tags for Identifying Languages' to BCP
 Date: 2004-12-08 17:56
 From: The IESG <iesg-secretary(_at_)ietf(_dot_)org>
 To: IETF-Announce <ietf-announce(_at_)ietf(_dot_)org>
 Reply to: iesg(_at_)ietf(_dot_)org

The IESG has been considering

- 'Tags for Identifying Languages '
   <draft-phillips-langtags-08.txt> as a BCP

There have been considerable changes to the document since the
initial last call, and the IESG would like the community to consider
the changes.  In addition, the authors have prepared text describing
why this mechanism is needed as a replacement for the existing
procedure; it is included below.

The IESG plans to make a decision in the next few weeks, and solicits
final comments on this action.  Please send any comments to the
iesg(_at_)ietf(_dot_)org or ietf(_at_)ietf(_dot_)org mailing lists by 
2005-01-05.

The file can be obtained via
http://www.ietf.org/internet-drafts/draft-phillips-langtags-08.txt


I have some comments below.  They should not be construed as
a complete or thorough critique of the draft; they're initial
comments based on a quick review of the draft.

One overall comment; I'm surprised to hear that this was
already at last call -- some notice to mailing lists which are
heavily affected by the proposed changes (e.g. ietf-822)
would have been nice...   Considering the depth and breadth
of the specific issues discussed below, I'm not sure that
"surprise" is adequate...

This specification, the proposed successor to RFC 3066, addresses a number of
issues that implementers of language tags have faced in recent years:

[...]

    * Accessibility of the underlying ISO standards for implementers

[...]

There are problems with the the RFC 3066 definition of generative tags,
however. The ISO 639 and ISO 3166 standards are not freely available and 
evolve
over time.


Accessibility has not been a problem for this implementor (who,
incidentally, was unaware of this draft until the New
Last Call).  ISO 639 language code lists are readily available in
HTML-ized English and French via
        http://www.loc.gov/standards/iso639-2/englangn.html
and
        http://www.loc.gov/standards/iso639-2/frenchlangn.html
ISO 3166 country code lists are readily available in plain text
in English and French via
        
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1-semic.txt
and
        
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-fr1-semic.txt

The ISO registered code lists are freely available at the URIs
given above.  This implementor has used those URIs for years
without difficulty.  The ISO standards themselves are not free,
but neither are they required for an implementor to identify
the valid codes -- the free lists suffice for that purpose.

The largest change in the specification is that it modifies the structure of
the language tag registry. Instead of having to obtain lists of codes from 
five
separate external standards (not all of which are easily available), the IANA
registry will maintain a comprehensive list of valid subtags that can be used 
in
the generative mechanism in a machine-parseable text format.


Contrary to the implicit claim, the ISO documents mentioned
above comprise two standards (available in two languages each),
not "five separate external standards".

The availability of those two definitive standards in bilingual
forms allows implementors to (for example) construct menus of
available language and country code tags in BOTH languages used
in ISO standards.  The draft proposes declaring those standards
effectively irrelevant, being replaced by a single monolingual
(English) IANA registry. While it has become fashionable in
recent years among some factions within the United States
to bash France, the French people, their culture, and their
language, it seems inappropriate to extend such bashing to
technical standards which supposedly apply in an international
context. Especially when dealing with the subject matter of
language itself. The unavailability of the registered value
"description" in 50% of the languages traditionally used for
international standards publication, including the existing ISO
639 and 3166 codes, is a serious defect in the proposal, and
a departure from the status quo under RFC 3066 (which directly
refers to the bilingual ISO standards as definitive). [N.B. I
am not accusing the draft authors of French-bashing; it's just
that some of us are a bit more sensitive to Anglo-centricity
than others.  And it remains a fact that the draft has no
provision for bilingual descriptions of any subtag fields. (I
note in passing that the UN regional codes newly referenced
by this draft are available in HTML-ized (ostensibly) English
(though I've never seen an A-ring in English text before...)
and French).]

It is claimed that:

In addition, and very importantly, language tags that are newly
defined by this specification are compatible with the ABNF syntax, matching,
parsing, and other mechanisms defined by RFC 3066.

[...]

The design of this
specification was carefully created so that all of the new values that can be
assigned fit the pattern for registered language tags under RFC 3066.

[...]

The revision proposed in this
specification addresses the needs of this community of users with a minimal
impact on existing content and implementations, while providing a stable basis
for future development, expansion, and improvement.


The ABNF in the draft permits all of the following tags which
are not legal per the RFC 3066 ABNF:
   supercalifragilisticexpialidoceus
   y-----
   x1234567890abc
   a123-xyz
Specifically, the draft allows, and RFC 3066 disallows:
   subtags more than 8 octets in length
   hyphens which do not separate subtags
   zero-length subtags
   primary tags which are not purely alphabetic
Curiously, all of those are permitted by the draft ABNF
production "grandfathered", which is presumably included to
accommodate tags which ARE permitted by RFC 3066, rather than
to provide a means for specifying incompatible tags (i have no
provision for parsing unlimited-length subtags, zero-length
subtags, hyphens not delimiting subtags, or non-alphabetic
primary tags, so I know of one implementation which will
suffer a major impact from the incompatible syntax change).
I see no reason for the ABNF to permit such content as is
forbidden by RFC 3066; the actual ABNF for what RFC 3066
permits is contained within 3066, and could have been directly
incorporated rather than producing a "grandfathered"
production which opens up several cans of worms.

One defect related to tag length in RFC 3066 is not remedied
by the draft; indeed the problem is greatly exacerbated.  One
use of language tags is in encoded-words as specified by RFC
2047 as amended by RFC 2231 and errata. The total length of
an encoded word, including some syntactic glue, a charset tag,
and some text content in addition to a language tag, is strictly
limited.  Unfortunately, a language- tag's length is unlimited by
the ABNF in RFC 3066 (due to an unlimited number of subtags)
and in the draft.  To date, the problem has been more theoretical
than practical due the limited number of subtags typically used.
In particular, tags other than private-use tags with more than
two subtags require registration under RFC 3066 rules, and it
is a trivial matter to determine the longest registered tag.
The draft, however, encourages use of more subtags as well as
removal of the subtag length upper bound; moreover, it permits
infinite numbers of subtags without requiring registration of
the resulting complete tag.  Consequently it is impossible to
establish an upper bound on the length of a language tag which
might be encountered -- that affects not only practical
implementations, but it negatively impacts protocol design,
such as the MIME encoded-word case.

The new registry provides a complete,
easily parseable file which provides the precise the contents of valid tags 
for
any point in time.


That is the first time I have ever heard ISO 8601 date
format described as "easily parseable".  Perhaps the draft
authors meant to say that a specific subset of the tortuously
complex ISO 8601 date format is used, but that is not what
the draft states.  This implementor does not look forward
to having to parse all of the various and sundry ISO 8601
variants. [Moreover, while the draft authors have complained on
the one hand about unavailability of ISO documents regarding
language and country codes (where in fact the code lists
needed for implementation are freely available), on the other
hand they specifically require adherence to a standard which
is not freely available, and which is required in order to be
able to parse the proposed revised registry (the existing IANA
language-tags registry does not appear to require use of that
standard specifically, nor do the ISO code lists). According to
the ISO web site, ISO 8601 costs either 108 or 122 Swiss
francs.]

I am absolutely shocked that a draft dealing with language
lacks an "Internationalization considerations" section as
recommended by RFC 2277 (a.k.a. BCP 18).

Perhaps even more disturbing is the content of the "IANA
Considerations" section; the draft predicts that certain things
will happen ("IANA will"[...]), but doesn't actually direct
(e.g. "IANA shall") IANA to do anything.  The placement of that
section does not correspond to current RFC-Editor guidelines
(it should appear after Security Considerations); also on that
point, Appendices should precede References.

Many of the references are obsolete (e.g. RFCs 1327,
1521), there is no differentiation between normative and
informative references, and at least one reference ([19])
gives a bracketed URI rather than the correctly formatted
RFC reference.  Although reference is made to the "Accept-
Language" header field, RFC 3282 (the defining RFC for that
field) is not listed among the references.

The formatting of the draft is atrocious, particularly the
bizarre "outdenting" (in some cases breaking in the middle of
words) near the bottom of page 7, towards the lower part of
page 10, the middle of page 13, near the bottom of page 16,
towards the bottom of page 19, towards the lower part of page
23, at the bottom of page 29, the second-last text line on
page 33, and immediately before References (which incidentally
lacks a dot after the section number) (there also appears to
be missing some text after the last "bullet").

I am extremely surprised that the draft has been published
at least nine times in such a state of poor formatting and
poor attention to editorial content (e.g. obsolete and
missing references), and that it progressed as far as IESG
last call in such a state, with no Internationalization
considerations section, etc.

I am particularly concerned about the implementation
ramifications of the proposed changes, especially (as
noted in detail above):
1. the apparent contradiction between the stated
    objectives w.r.t. accessibility of relevant ISO data and
    standards and the reality of the proposal's
    implications (ISO 8601 date format parsing).
2. the clear contradiction between the claims about
    ABNF compatibility with RFC 3066 and the factual
    incompatibility of certain provisions in the grammar.
Considering the technical importance of those issues,
I would request that the IESG consider returning this
draft to the authors for further work before reconsidering
it for last call -- I'd want to have a chance to thoroughly
review the ABNF after the authors have addressed the
compatibility issue vs. RFC 3066 before this gets as far as
actually obsoleting current BCP (3066).

I have copied the ietf and ietf-languages mailing lists
in addition to the iesg list as requested; I have set
a suggestion for followup to the ietf list.

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf