ietf-822
[Top] [All Lists]

Re: Unicode newsgroup name options

2003-02-24 21:59:28

Bruce Lilly <blilly(_at_)erols(_dot_)com> writes:
Russ Allbery wrote:

I don't see how.

You mail the mail to news gateway address.  It adds a Newsgroups header
that contains the corresponding newsgroup.  Whether that header is in
UTF-8, punycode, or something else really makes little difference from
the perspective of the gateway.

The scenario: User U wishes to crosspost to two newsgroups, one a
conventional newsgroup, and one an unmoderated extended newsgroup.  For
whatever reason (e.g. the conventional group is moderated),

Moderation is not gatewaying in my document.  I separated them for good
reasons, this being one of them.  I'm talking about mail to news
gatewaying that isn't moderation in the section on which you're
commenting.

he chooses to use the gateway to the conventional newsgroup.  His UA
inserts a Newsgroup field with the two names in punycode (since it's
being mailed) and sends it to the gateway. The gateway is somehow
supposed to magically convert the punycode names to UTF-8, but no
existing gateway does so.

This isn't how most (nearly all) mail to news gateways work.  Mail to news
gateways have addresses corresponding to newsgroups (whatever you want
those addresses to be) and messages sent to those addresses are posted to
those newsgroups.  Some gateways go a step farther and analyze the headers
and support crossposting.

Gateways don't expect the e-mail message to have a Newsgroups header.  In
fact, honoring the Newsgroups header in a mail message when processing the
message at a mail to news gateway can cause a bunch of other problems.

So you're analyzing a situation that in practice is exceedingly rare and
isn't the normal or recommended way of managing mail to news gateways,
because it causes mishandling of messages sent, e.g., by a user agent like
trn that doesn't use that meaning of Newsgroups in e-mail messages.

13 requires (for backwards compatibility) that the article format and
the format used in IMAP be tha same, which is not the case for A.

No, it doesn't require that.  The IMAP server could be modified
                                                          ========
In what way is that backwards compatible with the existing (i.e.
unmodified) installed base of IMAP servers?

I don't find your personal definition of backwards compatible to be
reasonable or useful.  I'm using the standard definition.

Under proposal (A), if that's what you're asking, existing IMAP servers
will not work with new non-ASCII newsgroups without modification.

That "process" is often reading directly from the spool (which under A
is in utf-8) or via NNTP (and an NNTP server has no way of determining
if its client is an IMAP server).  In any event, nothing now in that
process does such a conversion, so that is a backwards compatibility
issue which rates at least a 'Y'.

Funny, that's why I gave it a Y.

In the case of LMTP transfer to the IMAP server, the server has no way
to determine whether the message is "news" or "mail" and has no way to
treat them differently.

The process doing LMTP does.  You clearly would not be able to feed new
non-ASCII newsgroups to an IMAP server using LMTP until you modified that
process to correctly encode the newsgroup names.

Likewise for a message moved from folder to folder (on the IMAP server;
please read a description of the IMAP protocol if you don't understand
that) by a client's user.

All messages in IMAP folders, including the news ones, have mail syntax.
I think I've said this about four times now.

You'll have to show how it's possible when there's no way for an IMAP
server to differentiate (and therefore treat differently) "news" and
"email".

No, I don't.

There is a point at which the IMAP server obtains the messages from news.
At that point, you do a conversion.  At every point that messages transit
from mail to news, in proposal (A), you have to do a conversion.  That's
the whole point of proposal (A).  Proposal (A) involves establishing a
separate news article format from the mail article format for messages
posted to non-ASCII groups (and those messages only) and then putting
gateways at every point of transition between news and mail.

You can certainly not like that proposal.  It makes gatewaying much more
of a pain for non-ASCII groups.  But it's quite definitely technically
feasible.

In the way that I phrased them in that summary, they can be.  There's a
converting gateway at (13) so that any articles present in the IMAP
system are mail-compatible.

What "gateway"?  IMAP servers get news articles from the news spool
directly, via NNTP, or via LMTP.

That's a gateway, by the standard definition of a gateway.  A gateway is a
system that talks two different protocols and moves data from one protocol
to another.

Currently, those gateways can be exceptionally easy (as long as they don't
have to deal with existing news messages with 8-bit header content).
Proposal (A) would make them significantly more complex for non-ASCII
groups.

Note, however, that it has no impact on existing ASCII groups.

Do any of the schemes?

No.

Is that relevant in differentiating them?

No.  It's relevant in addressing your claim that this proposal is not
backward compatible.  It would not be backward compatible if it affected
existing non-ASCII groups and the arrangements in place for them.

I wouldn't call myself an IMAP implementor, but I'm familiar with the
rudiments of the protocol and the messaging model, and I administer my
own IMAP server.  As far as I can tell, there's no reason to convert a
perfectly legal ASCII punycoded name; it would simply be left as the
ASCII-compatible name.

The only reason that I could think of is that it may allow the IMAP client
to correctly display the name of the folder without having to understand
punycode; in other words, it might be a useful intermediate measure to
help out clients until they fully understand punycode.

Which is why I said "in order to properly display the names."  Until
the names are displayed properly, we haven't actually achieved our
goals; we've just succeeded in not breaking anything.  The whole point
of this is obviously to show the user real non-ASCII group names.

Yes, but that boils down to the distinction between categories
(not methods) C and D.  They were defined as:

D means change is very desirable but not absolutely necessary, and C
means change would be convenient but unmodified software is still fairly
usable.

I interpret "usable" to mean that the user can still post, follow-up,
and read articles.

I didn't.  And it's my document.  :)

I interpreted "usable" as "the end user sees correctly displayed non-ASCII
newsgroup names," since otherwise we don't have usable non-ASCII group
names.  (I *didn't*, however, require that the news server administrator
see non-ASCII newsgroup names correctly displayed.  This was intentional.
The point of the protocol is the end users, not the server administrators;
some inconvenience for server administrators can be more easily tolerated
if the protocol is working for the end users.)

Another word besides usable may have been better.  "Non-ASCII functional"
perhaps.

 | 1   2   3   4   5   6   7   8   9  10  11  12  13  14
-+------------------------------------------------------
A| D   C   N   N   D   Y   N   N   N   Y   N   D   Y   D
B| Y   Y   C   Y   Y   N   N   N   N   N   N   N   C   D
C| D   C   C   C   D   N   N   N   N   N   N   N   C   D

I believe that your modifications are wrong, as noted above, although
some of them are debatable.  (13) and (14) for (A) are definitely
wrong.

At best they are both 'Y',

(13) is a Y.  I have absolutely no dispute there.  (14) is not; that's the
whole point of making (13) a Y.  If (13) does the conversion, (14) doesn't
have to.  (Although (14) would then see punycode newsgroup names, so it's
still a D.)

(D) is also wrong.  (1) and (5) are a D.  If news readers are not
modified, the correct newsgroup names are not displayed, which means
that proper internationalization has not occurred.

See my comments above re "usable" and the distinction.

Right.  That's just a wording difference.

If anybody still wants to pursue A, they should present a comprehensive
plan as to how it can be achieved with existing (unmodified) IMAP
servers -- a plan that is agreeable to the IMAP experts.

I don't think I'd make that strong of a requirement, but certainly one of
the things that I hoped to accomplish by posting this summary and this
discussion to the IMAP list is to make them aware of this discussion and
the possible impacts on IMAP implementations.

I am not arguing that's the best approach; personally, I favor (C).
However, it is a possible approach and has some significant advantages
from the perspective of making the most possible work with the least
possible modifications to existing news software.

How so -- B, C, and D all have more 'N's than A; A has more
incompatibilities ('Y' and 'I') than C or D and at least as many as B --
how can that be "least possible modifications"?

See the note at the end of the original summary about how some news
clients have been confirmed to just work with UTF-8 newsgroup names
without any modification, including displaying them correctly.  This
property is unique to (A) and not shared by any other proposal that I've
seen.

Mark Crispin doesn't feel that this is likely to be all that useful in
practice due to other GUI issues raised by Unicode support in general, and
he does have a valid point.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>