ietf-822
[Top] [All Lists]

Re: Unicode newsgroup name options

2003-02-24 16:01:34

Bruce Lilly <blilly(_at_)erols(_dot_)com> writes:
Russ Allbery wrote:

One or another of these proposed options affect every single component
of this system except for (11).  In the case of (11), none of these
three proposals will affect any existing mail to news gateways.
Existing mail to news gateways may not be able to handle new non-ASCII
newsgroups

Certainly that affects those gateways!  A would require such a gateway
to affect a transformation which no existing gateway does.

I don't see how.

You mail the mail to news gateway address.  It adds a Newsgroups header
that contains the corresponding newsgroup.  Whether that header is in
UTF-8, punycode, or something else really makes little difference from the
perspective of the gateway.

13 requires (for backwards compatibility) that the article format and
the format used in IMAP be tha same, which is not the case for A.

No, it doesn't require that.  The IMAP server could be modified to convert
formats when dealing with non-ASCII groups.  A modification is required
here for all IMAP servers that want to deal with non-ASCII group names,
though.

As IMAP has been described in other mailing list postings, I won't go
into detail, but any scheme where the header field format differs
between "news" and "email" won't work with IMAP, and that rules out A.

I think you should have read my summary more carefully.  The process that
gets the news into the IMAP server would have to do a conversion.

I'm not saying that this is at all pretty for the IMAP server or at all
ideal from an IMAP standpoint; I'm simply saying that it's technically
possible and one can't rule out solution (A) by saying that it's not.

Those are all backwards incompatibilities.  Moreover the issues
affecting 13 incorrectly assume that "news" and "mail" can be
differentiated.

In the way that I phrased them in that summary, they can be.  There's a
converting gateway at (13) so that any articles present in the IMAP system
are mail-compatible.

If the IMAP server talks to the NNTP server in real time, that conversion
would have to be done in real time.  This could potentially be extremely
obnoxious.  It's a significant negative for solution (A).  Note, however,
that it has no impact on existing ASCII groups.

And while *some* UAs might be usable unmodified *some* _will_ require
modification to support UTF-8 I/O;

Right.

IMAP servers (13) may want to recode the newsgroup names from punycode
to UTF-7, but would not need to make any transformations to the
articles themselves.

Why would anybody want to "recode [...] punycode to UTF-7"?
As noted above, there may be issues with IMAP<->NNTP communications.

I honestly don't know; there was some discussion of IMAP using mUTF-7
folder names, and I didn't know if that was an issue that would have to be
taken into account.  You'd have to ask an IMAP implementor.

(C) punycode everywhere
=======================

This proposal mandates modifications to the posting agents (1) and the
news readers (5) in order to properly display the names.

Strictly speaking, that is not mandated; 1 and 5 can still be used by
users.

Which is why I said "in order to properly display the names."

Until the names are displayed properly, we haven't actually achieved our
goals; we've just succeeded in not breaking anything.  The whole point of
this is obviously to show the user real non-ASCII group names.

No modifications are required to (2) or (4), the NNTP servers, although
without modifications the server administrator would have to work with
encoded group names.  It would provide a much better user interface if
the administrative tools implemented punycode encoding and decoding for
easier handling of non-ASCII newsgroup names.

Use of the canonical name may in fact be a benefit, e.g. in the case of
an administrator not familiar with Oriental, Cyrillic, Hebrew, Arabic,
Devanagari, etc. letterforms when dealing with some non-ASCII names.
Therefore the last statement above "It would provide a much better..."
is questionable.

I was making the assumption that one could always work in either the
encoded or decoded version of the name if that change were made.  I think
that it's obvious that having the choice is clearly superior than having
to use the encoded name and much preferrable in more mundane situations,
such as European news admins dealing with groups whose names could be
represented by ISO 8859-15.

  | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
 -+------------------------------------------------------
 A| D   C   N   N   D   Y   N   N   N   Y   N   D   Y   D
 B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   C   D
 C| D   C   C   C   D   N   N   N   N   N   N   N   C   D

Based on the comments above, there are a few errors, I've had to
add a I (for incompatible) category, and I've added a fourth row:

  | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
-+------------------------------------------------------
A| Y   C   N   N   Y   Y   N   N   N   Y   Y   Y   I   I
B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   Y   D
C| C   C   C   C   C   N   N   N   N   N   N   N   C   C
D| C   C   C   C   C   N   N   N   N   N   N   N   C   C

I believe that your modifications are wrong, as noted above, although some
of them are debatable.  (13) and (14) for (A) are definitely wrong.

(D) is also wrong.  (1) and (5) are a D.  If news readers are not
modified, the correct newsgroup names are not displayed, which means that
proper internationalization has not occurred.  Similarly, this proposal
changes either (13) or (14) to a D, since a client reading through IMAP
would now have to look up somewhere what the *real* group name is, and
that would have to be done either in the client or in the IMAP server.

I agree with your change to (13) for (B); I think my original summary was
wrong.  (13) may actually be an N for (C).

(14) for (C) is, again, D.  The IMAP client would have to understand the
punycode newsgroup name to display it correctly.

The above summary I believe correctly indicates that proposal (B)
requires the most changes to be made to the news system itself.

7-9 are irrelevant (identical for all rows),

They are for these proposals.  I included the mail transport system
because other proposals would have required it to change.

A is completely incompatible with IMAP,

Disagree, as mentioned above.

Any Y or I is an incompatibility and rules out any chance of IESG
approval (barring a transition plan and protocol negotiation with
fallback where feasible).

Don't be absurd.  Y says that the software requires modifications in order
to work with non-ASCII groups.  Nothing about that inherently would rule
out any chance of IESG approval; it doesn't break anything about use of
that software with the existing ASCII groups.

There are no cases with any of these three proposals where existing news
software completely breaks, as existing news software is already eight-bit
clean (as noted as an assumption several times in my summary, an
assumption that has already been discussed here at length).  To avoid
problems with IMAP servers, simply don't expose the new groups via IMAP
until the IMAP server has been modified to deal with them.

I am not arguing that's the best approach; personally, I favor (C).
However, it is a possible approach and has some significant advantages
from the perspective of making the most possible work with the least
possible modifications to existing news software.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>