ietf-822
[Top] [All Lists]

Re: Unicode newsgroup name options

2003-02-24 18:16:56

Russ Allbery wrote:
Bruce Lilly <blilly(_at_)erols(_dot_)com> writes:

Russ Allbery wrote:


One or another of these proposed options affect every single component
of this system except for (11).  In the case of (11), none of these
three proposals will affect any existing mail to news gateways.
Existing mail to news gateways may not be able to handle new non-ASCII
newsgroups


Certainly that affects those gateways!  A would require such a gateway
to affect a transformation which no existing gateway does.


I don't see how.

You mail the mail to news gateway address.  It adds a Newsgroups header
that contains the corresponding newsgroup.  Whether that header is in
UTF-8, punycode, or something else really makes little difference from the
perspective of the gateway.

The scenario: User U wishes to crosspost to two newsgroups, one
a conventional newsgroup, and one an unmoderated extended newsgroup.
For whatever reason (e.g. the conventional group is moderated), he
chooses to use the gateway to the conventional newsgroup.  His UA
inserts a Newsgroup field with the two names in punycode (since
it's being mailed) and sends it to the gateway. The gateway is
somehow supposed to magically convert the punycode names to UTF-8,
but no existing gateway does so.  I.e. message format A is not
backwards-compatible with existing gateways. What will happen
is that the gateway (seeing only friendly ASCII in the newsgroups
field) will blithely pass it on to an injection agent.  Now you
might say that it is then the injection agent's responsibility to
perform the magic conversion, but no existing injection agent does
that either.

13 requires (for backwards compatibility) that the article format and
the format used in IMAP be tha same, which is not the case for A.


No, it doesn't require that.  The IMAP server could be modified
                                                         ========
In what way is that backwards compatible with the existing (i.e.
unmodified) installed base of IMAP servers?

As IMAP has been described in other mailing list postings, I won't go
into detail, but any scheme where the header field format differs
between "news" and "email" won't work with IMAP, and that rules out A.


I think you should have read my summary more carefully.  The process that
gets the news into the IMAP server would have to do a conversion.

That "process" is often reading directly from the spool (which under
A is in utf-8) or via NNTP (and an NNTP server has no way of determining
if its client is an IMAP server).  In any event, nothing now in that
process does such a conversion, so that is a backwards compatibility
issue which rates at least a 'Y'.  In the case of LMTP transfer to
the IMAP server, the server has no way to determine whether the message
is "news" or "mail" and has no way to treat them differently. Likewise
for a message moved from folder to folder (on the IMAP server; please
read a description of the IMAP protocol if you don't understand that)
by a client's user.

I'm not saying that this is at all pretty for the IMAP server or at all
ideal from an IMAP standpoint; I'm simply saying that it's technically
possible and one can't rule out solution (A) by saying that it's not.

You'll have to show how it's possible when there's no way for an
IMAP server to differentiate (and therefore treat differently)
"news" and "email".

Those are all backwards incompatibilities.  Moreover the issues
affecting 13 incorrectly assume that "news" and "mail" can be
differentiated.


In the way that I phrased them in that summary, they can be.  There's a
converting gateway at (13) so that any articles present in the IMAP system
are mail-compatible.

What "gateway"?  IMAP servers get news articles from the news spool
directly, via NNTP, or via LMTP.

If the IMAP server talks to the NNTP server in real time, that conversion
would have to be done in real time.

By whom? No existing IMAP server trats NNTP specially.  No existing
NNTP server performs conversions for some clients and not for
others.

Note, however,
that it has no impact on existing ASCII groups.

Do any of the schemes? Is that relevant in differentiating them?

And while *some* UAs might be usable unmodified *some* _will_ require
modification to support UTF-8 I/O;


Right.


IMAP servers (13) may want to recode the newsgroup names from punycode
to UTF-7, but would not need to make any transformations to the
articles themselves.


Why would anybody want to "recode [...] punycode to UTF-7"?
As noted above, there may be issues with IMAP<->NNTP communications.


I honestly don't know; there was some discussion of IMAP using mUTF-7
folder names, and I didn't know if that was an issue that would have to be
taken into account.  You'd have to ask an IMAP implementor.

I wouldn't call myself an IMAP implementor, but I'm familiar with
the rudiments of the protocol and the messaging model, and I
administer my own IMAP server.  As far as I can tell, there's
no reason to convert a perfectly legal ASCII punycoded name; it
would simply be left as the ASCII-compatible name.  Especially
as in either case there is code to be written, and punycode will
have to be supported for IDN -- I doubt if anybody is fond of new
uses for UTF-7.

(C) punycode everywhere
=======================


This proposal mandates modifications to the posting agents (1) and the
news readers (5) in order to properly display the names.


Strictly speaking, that is not mandated; 1 and 5 can still be used by
users.


Which is why I said "in order to properly display the names."

Until the names are displayed properly, we haven't actually achieved our
goals; we've just succeeded in not breaking anything.  The whole point of
this is obviously to show the user real non-ASCII group names.

Yes, but that boils down to the distinction between categories
(not methods) C and D.  They were defined as:

D means change is very desirable
but not absolutely necessary, and C means change would be convenient but
unmodified software is still fairly usable.

I interpret "usable" to mean that the user can still post,
follow-up, and read articles.  Yes, display of non-ASCII
names, where available, would be "convenient".  Maybe
we're trying to split too fine a hair between C and D;
at least neither is a show-stopper.

No modifications are required to (2) or (4), the NNTP servers, although
without modifications the server administrator would have to work with
encoded group names.  It would provide a much better user interface if
the administrative tools implemented punycode encoding and decoding for
easier handling of non-ASCII newsgroup names.


Use of the canonical name may in fact be a benefit, e.g. in the case of
an administrator not familiar with Oriental, Cyrillic, Hebrew, Arabic,
Devanagari, etc. letterforms when dealing with some non-ASCII names.
Therefore the last statement above "It would provide a much better..."
is questionable.


I was making the assumption that one could always work in either the
encoded or decoded version of the name if that change were made.  I think
that it's obvious that having the choice is clearly superior than having
to use the encoded name and much preferrable in more mundane situations,
such as European news admins dealing with groups whose names could be
represented by ISO 8859-15.

Yes, choice is good; if the hypothetical administrative tools with
codecs can be overridden by their users (i.e. the administrators),
great.

 | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
-+------------------------------------------------------
A| D   C   N   N   D   Y   N   N   N   Y   N   D   Y   D
B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   C   D
C| D   C   C   C   D   N   N   N   N   N   N   N   C   D


Based on the comments above, there are a few errors, I've had to
add a I (for incompatible) category, and I've added a fourth row:


 | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
-+------------------------------------------------------
A| Y   C   N   N   Y   Y   N   N   N   Y   Y   Y   I   I
B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   Y   D
C| C   C   C   C   C   N   N   N   N   N   N   N   C   C
D| C   C   C   C   C   N   N   N   N   N   N   N   C   C


I believe that your modifications are wrong, as noted above, although some
of them are debatable.  (13) and (14) for (A) are definitely wrong.

At best they are both 'Y', and you'll need to convince me that
that's feasible.

(D) is also wrong.  (1) and (5) are a D.  If news readers are not
modified, the correct newsgroup names are not displayed, which means that
proper internationalization has not occurred.

See my comments above re "usable" and the distinction.  Anyway,
it's not worth quibbling; either 'C' or 'D' is far preferable
to the 'Y's of A and B.

Similarly, this proposal
changes either (13) or (14) to a D, since a client reading through IMAP
would now have to look up somewhere what the *real* group name is, and
that would have to be done either in the client or in the IMAP server.

Good point. I agree.

I agree with your change to (13) for (B); I think my original summary was
wrong.  (13) may actually be an N for (C).

(14) for (C) is, again, D.  The IMAP client would have to understand the
punycode newsgroup name to display it correctly.

Whatever; as above, at least it certainly isn't a 'Y' or 'I'
as with method A.

A is completely incompatible with IMAP,


Disagree, as mentioned above.

If anybody still wants to pursue A, they should present a
comprehensive plan as to how it can be achieved with existing
(unmodified) IMAP servers -- a plan that is agreeable to the
IMAP experts.  Frankly, though, in light of the lack of 'Y's
in methods C and D, it's probably not worth spending more
time on A or B.

Any Y or I is an incompatibility and rules out any chance of IESG
approval (barring a transition plan and protocol negotiation with
fallback where feasible).


Don't be absurd.  Y says that the software requires modifications in order
to work with non-ASCII groups.  Nothing about that inherently would rule
out any chance of IESG approval; it doesn't break anything about use of
that software with the existing ASCII groups.

It means that new articles present the existing infrastructure
with illegal content, and that is certainly a barrier unless,
as stated, there is a transition plan and protocol negotiation
to cover extended syntax, with fallback for backwards compatibility.
That would have to be worked out in fairly good detail for
either A or B (C and D have no such issues).

There are no cases with any of these three proposals where existing news
software completely breaks, as existing news software is already eight-bit
clean (as noted as an assumption several times in my summary, an
assumption that has already been discussed here at length).  To avoid
problems with IMAP servers, simply don't expose the new groups via IMAP
until the IMAP server has been modified to deal with them.

IMAP *is* "existing news software" (actually a protocol, but it's
implemented in software).  And utf-8 in article fields (method A)
breaks IMAP.  "[S]imply don't expose the new groups via IMAP"
doesn't work; if articles (even crosposted ones, though
the extended groups are not present) exist in the news spool,
they're available via IMAP (at least in some implementations
as administered in some places).

I am not arguing that's the best approach; personally, I favor (C).
However, it is a possible approach and has some significant advantages
from the perspective of making the most possible work with the least
possible modifications to existing news software.

How so -- B, C, and D all have more 'N's than A; A has more
incompatibilities ('Y' and 'I') than C or D and at least as
many as B -- how can that be "least possible modifications"?