ietf-822
[Top] [All Lists]

Re: UTF-8 over RFC 2047

2003-01-14 22:36:38

Andrew Gierth <andrew(_at_)erlenstar(_dot_)demon(_dot_)co(_dot_)uk> writes:
"Russ" == Russ Allbery <rra(_at_)stanford(_dot_)edu> writes:

 Russ> IDNA certainly will be the easiest to implement for the servers.

I don't think I entirely agree - you seem to be assuming that the server
can avoid implementing the encoding, but this is not necessarily true
once you start thinking about administrative issues such as adding or
removing groups.

No, I was more thinking that the server could just use GNU Libidn, which
was recently released under the LGPL.  :)  The big advantage of using a
standard encoding format is that many software authors won't have to
implement it themselves.

There is also the 'mp3 problem' - certain sequences of US-ASCII
characters appearing as the encoded form of a non-ASCII name will have
undesired consequences on the propagation of groups.

Yeah, that's an issue.  That's likely to apply to pretty much any ASCII
encoding due to the way that newsfeed patterns are currently built.
(Although if that's the worst problem that we have, I think we'd get off
pretty easy.)

plus, there is the '.' to '-' translation issue which one has to be
careful of, and the fact that some sites are (bogusly) rejecting '%' in
local-parts, so there are considerations that apply to this encoding
that may be awkward in other contexts.

Yeah, although I believe IDN wins on all of those counts (although it does
use -, I believe, and I'm not sure if it does in ways that would cause us
problems).

This is one of the reasons why my proposal (which was the basis for what
ended up in the draft, other than the fact that I did not propose using
UTF-8 for anything other than newsgroup names, or in message/rfc822)

It lost quite a bit in the translation there; I would have voted for that
over the option chosen.

did not suggest using the encoding everywhere, but strictly at
mail<->news borders such as the moderation mechanism. (Since only
newsgroup names are affected, existing group- or hierarchy-specific
gateways don't need to do anything special.)

The only worry that I have for this is that it adds a lot of encoding
transitions.  Each time one has to recode other than at the end point adds
another potential point of failure (which was one of the things that
people were complaining about with RFC 2047).  I'm not sure that saving
encoding on the wire is worth having news messages *necessarily* involve
8-bit characters in the headers as opposed to that being optional where
user agents are known to support it.

That being said, I could certainly live with this.

This can be retroactively implemented for many existing injectors simply
by wrapping the mail-to-moderator script to do the necessary conversions
to the destination address and the Newsgroups header. This should be
easier than doing it in all clients.

Clients are going to have to implement an encoding whatever way you look
at it; either they're going to have to implement UTF-8, or they're going
to have to implement something else.  I'm not sure that it's really that
much easier to implement UTF-8 than IDN.  Although UTF-8 does have the
advantage that if you don't implement it at all and just echo raw bytes to
the screen, it's vaguely, distantly possible that the right thing will
happen, at least on some Unix systems.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>

<Prev in Thread] Current Thread [Next in Thread>