ietf
[Top] [All Lists]

RE: [Ltru] Last call comments on LTRU registry and initialization documents

2005-09-09 11:17:19


--On Wednesday, 07 September, 2005 12:19 -0700 Addison Phillips
<addison(_dot_)phillips(_at_)quest(_dot_)com> wrote:

Comments on draft-ietf-ltru-registry and
draft-ietf-ltru-initial and, secondarily, on
draft-ietf-ltru-matching...

I've thought a lot about the excellent analysis and comments
in John Klensin's message. My perception is that we have a
divergent view of the structure and significance of the LTRU
draft(s). 

First, my thanks for the obviously careful reading and thought.
We may indeed have divergent views, although, after reading your
notes I believe that, in practical terms, we are pretty close
together.

Although superficially the drafts are very different than the
RFC 3066 that they seek to replace, in fact the structure is
very similar. The drafts are attempting to fill certain gaps
unaddressed by RFC 3066 for implementers or for tag choice by
and the requirements on "content authors" (people who choose
tags or ranges).

Here are my basic thoughts in response to those comments:

1. All tags valid under the drafts would have been valid or
valid to register under RFC 3066. This is a key point. The tag
grammar proposed is intended to be highly compatible not just
with RFC 3066 but also with existing implementation. It is
expressed as greater restriction on what may be registered.
This provides more regularity in tags, although tags
themselves are not greatly changed. A subtag registry is, in
effect, a different way of expressing what is already in
place. 

I understand this, and think I understood it before.  There is a
difference, however, and its expresses itself, I believe, in two
ways.  (i) The 3066 model requires some process for every tag
that is to be used.  That is very bad in some ways, as the
documents and your notes correctly point out.  On the other
hand, it tends to keep the number of tags that are in use down.
Given more general IETF experience --which may not be applicable
to this particular situation -- a smaller number of tags in uses
tends toward better and more widespread interoperability.  (ii)
The idea of using a registry of components (in this case
subtags) that can be mixed and matched at the implementer's
discretion, albeit according to specific rules, is somewhat
untested in the IETF and the Internet applications community.
The closest equivalents involve protocols with a small set of
options that are presumed to be orthogonal.  In those cases, we
typically apply rather strict rules to be sure that each of the
possible combinations are tested and shown to interoperate;
combinations that are not or cannot be demonstrated 
to interoperate are dropped from the standard at higher maturity
levels.  Clearly, that level of testing is not possible or
appropriate here, but that level of innovation does justify
requiring some operational demonstration of impact on
interoperabilty, rather than stamping "BCP" on the document and
hoping that everything will work out.

2. The fact that it *always* narrows the potential subtags
that could be registered *in the future*, but has no effect on
any tags or subtags already extant means that (from an RFC
3066 implementation perspective) the range of tags actually
seen in the wild will be more limited than it might have been.
Commentators on this thread have implied that it is an
entirely new protocol, but I think that goes too far: it is
the same protocol with greater rigor on what may go where. 

While I completely agree with your final sentence, it is
possible to reach a different conclusion (or, more accurately,
pose a different hypothesis) about the first.    One could
equally claim that the discretion accorded the tag reviewer,
working with the IANA and under IESG supervision, would keep the
number of tags registered, and hence in the wild, lower in
practice than the number permitted as the crossproduct of subtag
registrations.  I am aware of the flaws in that argument,
including a tradition "if you can't get it registered easily
enough, just use what you want" in other parts of the community.


As a trivial and silly counterexample to the "fewer actual tags
in use" hypothesis, I would expect any competent reviewer under
3066 to look askance at a request to register en-Hang or en-Hant
while, as I understand it, the fact that the three subtage "en",
"Hang", and "Hant" appear in the registry makes those
combinations valid under the LTRU model.  What would prevent
their appearance in the wild is that it would take someone who
was either stupid or perverse to want to use them (casual
readers, see the Aside below -- Addison clearly does not need
it).  But "stupid or perverse" is not a rarity around the
Internet.  Moreover, someone who was seriously security-paranoid
might wonder whether these perverse combinations could be a way
to code (not cypher, but code) secret/private messages.  It
seems to me that the risk there, while small, is greater than in
3066 and it is not called out in the security analysis (not that
I'm sure it should be: as Sam and Russ are painfully aware, I'm
very concerned about stopping rules in requirements for threat
analyses and presentations).

        Aside on the example above (LTRU participants can skip
        unless they want to check my logic): "en-Hang" and
        "en-Hant" would imply writing English in Korean Hangul
        or Traditional Chinese characters respectively.  In
        addition to those not exactly being common cases, it is
        not clear that they are feasible.  Since most Chinese
        characters cannot be used in an unambiguous phonetic
        way, one would presumably need a rather specific
        profile, presumably expressed as a variant or extension,
        to make things work (and even then, it would be
        strange).  Hangul is problematic in a different way.
        Unlike Chinese characters, it is definitely phonetic.
        But because it is rather carefully designed and
        structured around the needs of Korean, it is not clear
        to me, in my ignorance, that it could be used to
        represent the full range of English phonemes and
        syllables with reasonable accuracy.  Contrast these two
        examples with, e.g., en-Cryl (English written in
        Cyrillic characters) or en-Arab (English written in
        Arabic characters).  Those might be strange or even
        perverse, and they might be used to conceal the content
        of text from a casual reader/observer, but they would
        "work" perfectly well if read out loud, using
        conventions no more extreme (and probably less so) than
        some alternate spelling systems that use Roman-based
        characters but whose advocates claim are more consistent
        and easier than the normal spelling patterns.

3. The various rules and guidelines set down in the draft
provide a more rigorous registration process based on the
experience of operating ietf-languages for the seven or so
years. This could be seen to make it the "best current
practice" for registering language tags or their components.
The switch to subtags was chosen to spare the community
immense numbers of registrations of various subtag variations
(examples from the current registry: two German orthographic
subtags, eight registrations; two Chinese script subtags,
*twelve* registrations). 

In case I haven't made it clear enough in previous notes, I
_like_ this system and the ideas behind it.  I think
"Suppress-script" is a particularly nice idea given the
weaknesses we agree are present in the 3066 model.   I just
think we need to move with caution into somewhat uncharted
territory, doing so in a way that permits and encourages us to
apply more specific guidance for particular applications than
the general guidance of Section 4 of "Registry" (not that I find
anything in that section to disagree with).

4. The creation of a registry simplifies the work incumbent on
implementers or content authors, since they no longer have to
refer to (under RFC 3066) four separate tag-or-subtag
repositories and then synthesize the rules in RFC 3066 for
choosing between certain overlapping subtags (for example ISO
639-1/-2). The fact that there is a registry doesn't change
the fact that "somewhere" there is a list of subtags that may
be validly combined into tags.

See above.

5. There is a perfectly good matching scheme loosely described
in RFC 3066. This scheme is enshrined in numerous places,
including RFC 3282 (which, you'll note, also "Obsoletes:
1766", an example with 3066 of two RFCs obsolescing the same
BCP on separate days over a year apart). The additional forms
of matching described by the matching draft are interesting
and may be useful in a variety of applications (draft-matching
gives some examples). But they are unnecessary to the specific
task of updating RFC 3066. Applications of language tags in
the future may wish to choose one or another of the other
schemes from draft-matching to produce more interesting
results. But such additional schemes are not necessary to the
task of updating RFC 3066.

If the community feels that matching is so important that
draft-registry must deal with it directly, my suggestion would
be to take Section 2.5 verbatim from RFC 3066 and include it
in draft-registry. This preserves the vital reference to
language-ranges. It should be noted that RFC 3066 nowhere
provides an explicit treatise on matching. Both it and
draft-registry were written for compatibility with known
matching schemes. Success or failure of the draft should
necessarily be measured by its interoperability with existing
matching protocols. My belief is that there is high
interoperability, since the matching scheme is quite basic and
the rules governing tag choice gave careful consideration to
the problem of script subtags. 

My personal bias about how to do this, based on IETF experience
and some idiosyncrasies of the RFC series (notably the exceeding
bluntness of the "Updates" and "Obsoletes" instruments), is to
tell applications that they need to pick 

        (i) the registry, as defined by an RFC and an IANA
        entity.  I'd hope there would be only one, but Sam's
        suggestion may have some value,
        
        (ii) a matching rule, as defined by an appropriate RFC
        or text in the RFC defining/specifying the application,
        and optionally, 
        
        (iii) some application-specific additional rules or
        constraints, specified in the RFC defining/specifying
        the application.

Now, if we were having this discussion in, e.g., a JTC1 context,
I'd probably have a different bias.  But, in the IETF/RFC one,
I'm led to believe that "separate matching rule documents" makes
(ii) much cleaner than having it be "...by an appropriate RFC,
or Section XY of [ltru-registry], or text in the RFC ...".  I'm
especially drawn in that direction because I'm not enamored of
the 3066 matching rules and would prefer that it not become a
permanent default.  And RFC numbers are not a scarce resource.

But, all of that said, the ultimate semantics of "extra section
of LTRU-registry that specifies the 3066 matching rules" are the
same as those of "extra RFC that makes the 3066 matching rules
explicit".  The essence of my earlier comment was only that this
was a loose end that needed tying off if one wanted to claim
that 3066 was obsolete.  That requirement would be satisfied by
either of these semantically-equivalent approaches.

Which one to pick is, IMO, a strategic decision to be made in
the context of the broad needs and practices of the IETF, and
hence by the IESG with whatever mechanism it chooses to obtain
community input, rather than by a single WG.  
 
6. The tag forms used in the draft are, in fact, being
registered and adopted. I note that Google this morning
returns 41,600 hits for "zh-Hant" as a piece of content. Many
of these appear to be valid usages as language tags---script
subtags in the wild--rather than just mentions of the
registration. Thus the draft merely recognizes the "reality on
the ground" with regard to language tags. It does so by
reorganizing how tags are registered to make the scheme more
manageable.

This is great.  But all it means today is that, were this on the
standards track, it would be pretty easy to get it to Draft
under the general criteria that Sam and I have outlined.

7. The choice between STD and BCP tracks is really a toss-up.
There are very good arguments on both sides. The creation and
management of a registry does not lend itself to STD, but the
creation and testing of implementations does not lend itself
to BCP. My thought here is that one can view the draft
entirely through the lens of existing RFC 3066 implementers:
these documents represent a set of BCPs related to various
aspects of registering, choosing, and implementing language
tags. New implementations may be different as a result of the
improvements made (certain kinds of assumptions can be made
about a 3066bis tag that cannot be made about a 3066 tag). All
such implementations will be recognizably implementations of
RFC 3066, though, and to the benefit of all concerned (IMHO)
they represent the best current thinking on the manner in
which to identify languages on the Internet (given our legacy
considerations).

We agree about the "toss-up" part.  It is also true that LTRU
was told to develop a BCP and responded by recommending a BCP.
No one can blame the WG for following those instructions.  I
come down on the standards track side because of two things.
First, as I have noted before, we rarely create registries as an
end in themselves.  The second issue is a more basic property of
the IETF: while we describe our decision process in terms of
"rough consensus and running code", I have come to realize that
there should be a background chant of "interoperability,
interoperability, interoperability" every time it is said.  An
interoperability lens causes me to consider 3066 to have been a
marginal case for BCP and the LTRU documents to be over some "I
know it when I see it" line.

As I indicated in my recent note to Doug, none of my suggestions
or comments require, or even suggest, any fundamental changes to
the recommendations of the WG or, for the most part, even to its
documents.  They mostly have to do with procedures, processing
models, and a bit of loose-end-tying-up, but those are issues at
a rather different level.

   regards,
     john


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf