ietf-822
[Top] [All Lists]

What's in a "name"? (Was Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter))

2003-01-16 10:57:01

Keith Moore wrote:
There is a clean solution:
"Here's how to do it properly: use RFCs 2047 / 2231."  That (a) is
backwards compatible, (b) is consistent with Best Current Practice
(RFC 2277), (c) is consistent with the Internet Architecture (RFC
1958), (d) is compatible with RFCs 822, 2822 and the MIME RFCs, and
(e) is inclusive (GB18030 users are not ostracized). Raw untagged
utf-8 does not posess those characteristics.


it's not a complete solution, because 2047 only applies to human-readable
text, and the 2231 extensions only apply to parameter values in MIME headers.

2231 also amends 2047 to provide language tagging.  There's also a question
about whether or not 2231 only applies to parameters; that was true until
RFC 3335 used syntax elements (via a convoluted path) which had been
[re]defined by RFC 2231, but outside of a parameter.  But that doesn't
affect the Usefor draft.

As far as usenet article format and the Usefor draft is concerned, 2047/2231
covers everything except possibly newsgroups.  And that might in fact be
a complete solution to header field internationalization for usenet articles
(keep reading).  Under the current standard (RFC 1036), the only issue w.r.t.
newsgroups and the characteristics mentioned seems to be case-independence
vs. Internet Architecture and the related backwards compatibility issue, and
that hinges on whether or not a newsgroup specification is a "name" as that
term is used in RFCs 1958 and 2277

Expanding on the last point: a "name" is a chunk of text used in a
protocol exchange, not necessarily intended to be human-readable.
Newsgroups are clearly used in the protocol exchange[*]. RFC 1958 says
that names "should be in case-independent ASCII". Case-independence
is a compatibility issue that would need to be addressed (e.g. by
noting that newgroups SHOULD be recognized in a case-independent
manner, but MUST be generated in lower-case only (for backward
compatibility), with a note that a subsequent editions of the standard
are expected to require case-independent recognition and permit
arbitrary case in generation).

A "name", per RFC 2277, SHOULD have a language tag (that is a strong
recommendation, but not an mandatory requirement).  So long as newsgroups
will eventually have to be case-independent (unless a waiver for that
is obtained), and that will necessitate some server changes over a
transitionary period, there is a way out of the language-tagging issue
for newsgroups (which has been presented in the Usefor WG), viz. to
consider the newsgroup "name" in the sense of a 1958/2277 name, i.e. as
a protocol element, associating that protocol element with some
language-tagged human readable text (e.g. stored in a server "active"
file) and that human-readable text could be presented to the user.
Indeed, with such a scheme, there is in fact no need for IDNA for
newsgroups (though the two are not incompatible) -- any header
field-compatible string of text could be used to identify a newsgroup
for protocol purposes, and any arbitrary text in any language (or in
multiple languages) could be associated with that newsgroup "name".
While idna has some interesting features, AFAIK is is not inclusive
in the sense of not ostracizing the large proportion of humans who
may wish to use GB18030.

At the root of these issues is the need to make a decision; is a newsgroup
a protocol element ("name" in the 1958/2277 sense), or is is a "text string"?
Here is the relevant 2277 text:
   Internationalization is for humans. This means that protocols are not
   subject to internationalization; text strings are. Where protocol
   elements look like text tokens, such as in many IETF application
   layer protocols, protocols MUST specify which parts are protocol and
   which are text. [WR 2.2.1.1]

   Names are a problem, because people feel strongly about them, many of
   them are mostly for local usage, and all of them tend to leak out of
   the local context at times.
Some have given the opinion that newsgroup "names" should be human-readable
as well as protocol elements, in the same sense that a phrase associated
with a mailbox angle-addr is human-readable.  I.e. neither fish nor fowl,
but both. Note that there is no provision for an element to be both protocol
*and* a text string; that would lead to the logical contradiction "not subject
to internationalization" *and* "subject to internationalization". Newsgroup
"names" are clearly used by servers and are transmitted as protocol in
Newsgroups, Followup-To, and Control header fields (as well as in related
protocols such as NNTP and IMAP).

So, for purposes of a document intended to be a standards-track RFC,
it seems quite clear that a newsgroup "name" is a name in the RFC 1958
and RFC 2277 sense.  It is therefore not subject to internationalization,
That does not mean that the Usefor WG either should not or cannot come
up with an extra-protocol means of internationalized text associated with
a newsgroup "name"; it simply means that attempts to force internationalization
onto the protocol elements are misguided (and that is what has derailed
much of what has required "urgent attention" (and still requires it)).

-----------
* if a newsgroup were just a human-readable text string, 2047 would be
applicable, case-independence would not apply, but internationalization
would be required (including language tagging).


<Prev in Thread] Current Thread [Next in Thread>