ietf
[Top] [All Lists]

RE: New Last Call: 'Tags for Identifying Languages' to BCP

2004-12-14 06:54:52
Resuming my comments:


-----Original Message-----
From: ietf-languages-bounces(_at_)alvestrand(_dot_)no [mailto:ietf-languages-
bounces(_at_)alvestrand(_dot_)no] On Behalf Of Bruce Lilly

[snip]

Specifically, the draft allows, and RFC 3066 disallows:
   subtags more than 8 octets in length
   hyphens which do not separate subtags
   zero-length subtags
   primary tags which are not purely alphabetic
Curiously, all of those are permitted by the draft ABNF
production "grandfathered"...

The "grandfathered" production in the current draft is 

grandfathered   = ALPHA *(alphanum / "-")

which does permit the sequences claimed by Bruce (except for
not-purely-alphabetic primary sub-tags), syntactically; but the set of
tags available for use is constrained by more than the ABNF syntax
alone: the acceptable productions for each sub-tag must either be taken
from one of the source standards or be registered. This is no different
from RFC 3066, so it is no more of a problem in this specification than
it was in RFC 3066.

It might be that the wording in 2.2 could be tightened up to eliminate
any possible question regarding the source for "grandfathered"
productions. Maybe it's not as obvious to someone coming to this cold as
it for us who have been discussing it for the past year.

Alternately, there's no reason why the "grandfathered" production
shouldn't be composed exactly to match what was used in RFC 3066:

grandfathered = 1*8ALPHA *("-" 1*8alphanum)

So, perhaps there is room for technical improvement, but there are not
any serious problems IMO -- certainly nothing as serious as the tone of
Bruce's conveyed.


I see no reason for the ABNF to permit such content as is
forbidden by RFC 3066; the actual ABNF for what RFC 3066
permits is contained within 3066, and could have been directly
incorporated rather than producing a "grandfathered"
production which opens up several cans of worms.

This vastly overstates the problem. There is no can of worms unless it
exists in tags currently available under RFC 3066.

 
One defect related to tag length in RFC 3066 is not remedied
by the draft; indeed the problem is greatly exacerbated...

Unfortunately, a language- tag's length is unlimited by
the ABNF in RFC 3066 (due to an unlimited number of subtags)
and in the draft...

In particular, tags other than private-use tags with more than
two subtags require registration under RFC 3066 rules, and it
is a trivial matter to determine the longest registered tag.
The draft, however, encourages use of more subtags as well as
removal of the subtag length upper bound; moreover, it permits
infinite numbers of subtags without requiring registration of
the resulting complete tag.

Bruce states incorrectly that there is no upper bound on the length of
sub-tags. His other concern, on the overall length of complete tags, is
valid, however: in terms of the ABNF syntax for both RFC 3066 and RFC
3066bis, infinite-length productions are possible, but RFC 3066 would
require registration of complete non-private-use tags while RFC 3066bis
does not.

There are three open doors for infinite-length productions in the ABNF
of the current draft:

- unlimited extlang sub-tags
- unlimited variant sub-tags
- the number of possible extensions is limited to 25, but the length of
extensions is unlimited

We could impose some upper limits on these things; e.g.

Language-Tag = ... *8("-" extlang) ... *8("-" variant) ... 1*25("-"
extension)
...
extension = singleton 1*8("-" 2*8alphanum)

If we also imposed limits on the length of private-use tags and defined
the grandfathered production in a way that made clear there was an upper
limit for those, then we could end up eliminating an issue that had
existed in RFC 3066.

So, I think Bruce has identified a valid issue here. I personally would
not have characterized it as greatly exacerbating, though, as the issue
was present in RFC 3066: private-use tags did not need to be registered
in RFC 3066, so there was no way in implementation could be written with
certain knowledge that tags beyond some given length would not be
encountered.


The new registry provides a complete,
easily parseable file which provides the precise the contents of
valid tags for
any point in time.

That is the first time I have ever heard ISO 8601 date
format described as "easily parseable".  Perhaps the draft
authors meant to say that a specific subset of the tortuously
complex ISO 8601 date format is used, but that is not what
the draft states...

It seems very clear that the authors intended only a specific subset:
YYYY-MM-DD. This is a minor technical issue that the authors can very
easily remedy.


I am absolutely shocked that a draft dealing with language
lacks an "Internationalization considerations" section as
recommended by RFC 2277 (a.k.a. BCP 18).

No more or less shocking than for RFC 3066, regarding which I'm not
aware of any complaints.

I don't quite understand what the critique is here: what is there to
internationalize about language tags? They are symbolic identifiers that
have no culture-specific content. The only possible consideration is the
charset, which for this spec involves ALPHA, DIGIT and "-" only. It's
true that ALPHA and DIGIT are not defined and that it would be better to
do so; it couldn't hurt to have a section for i18n considerations
(wouldn't need to be long). These are very minor concerns, and hardly
"shocking".


 
Perhaps even more disturbing is the content of the "IANA
Considerations" section; the draft predicts that certain things
will happen ("IANA will"[...]), but doesn't actually direct
(e.g. "IANA shall") IANA to do anything.  The placement of that
section does not correspond to current RFC-Editor guidelines
(it should appear after Security Considerations); also on that
point, Appendices should precede References.

There is a process issue here, but I have assumed that the authors have
dealt with IANA on that. Otherwise, these are editorial issues -- "even
more disturbing" seems to me to be somewhat overstated.


Many of the references are obsolete (e.g. RFCs 1327,
1521)... and at least one reference ([19])
gives a bracketed URI rather than the correctly formatted
RFC reference.  Although reference is made to the "Accept-
Language" header field, RFC 3282 (the defining RFC for that
field) is not listed among the references... 

The formatting of the draft is atrocious

All editorial.


there is no differentiation between normative and
informative references, 

A valid concern.

 
I am extremely surprised that the draft has been published
at least nine times in such a state of poor formatting and
poor attention to editorial content (e.g. obsolete and
missing references), and that it progressed as far as IESG
last call in such a state, with no Internationalization
considerations section, etc.

In fairness to the authors, page-oriented plain text is not exactly
conducive to authoring and revising a long document, and a lot of energy
was spent focusing on details that have far more consequence than
formatting. And, as mentioned above, the lack of an i18n-concerns
section is hardly without precident, and not particularly significant in
the case of this spec. This really feels like nit-picking, IMO. I'm left
wondering if Bruce has been looking for nits to pick because he is...


... particularly concerned about the implementation
ramifications of the proposed changes, especially (as
noted in detail above):
1. the apparent contradiction between the stated
    objectives w.r.t. accessibility of relevant ISO data and
    standards and the reality of the proposal's
    implications (ISO 8601 date format parsing).

As mentioned above, this really is a non-issue.


2. the clear contradiction between the claims about
    ABNF compatibility with RFC 3066 and the factual
    incompatibility of certain provisions in the grammar.

The main concern was with the "grandfathered" production, but I've shown
that that is a non-issue. The maximal length issue exists just as much
in RFC 3066 due to private-use tags; it is a technical concern that
might worth reviewing in RFC 3066bis, however; but it is not
insurmountable, and not a new problem.



Peter Constable
Microsoft Corporation

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf


<Prev in Thread] Current Thread [Next in Thread>