ietf-822
[Top] [All Lists]

Re: RFC 2047 and gatewaying

2003-01-04 07:37:37

dan(_at_)dankohn(_dot_)com (Dan Kohn)  wrote on 03.01.03 in 
<138AA78F80DCE84B8EE424399FFBF9C904F9CB(_at_)exchange(_dot_)ad(_dot_)skymv(_dot_)com>:

Charles, as someone with absolutely no political or religious stakes in
this discussion, I'd like to point out to you that I've found Ned Freed
to be one of the most level-headed, open-minded IETF oldtimers.

Well, I cannot make that description match with his current mails. They  
seem needlessly confrontational and ...

I've
seen him go out of his way to offer constructive criticism, and be far
more open to new ideas than many.

... perfectly free of anything that could be construed as constructive.

This is perhaps Ned's most important statement.  At countless times in
the last 20 years, the IETF and it's predecessors have chosen backward
compatibility over the "efficient" solution.  The (IESG-approved) IDNA
drafts epitomize this approach, by promulgating an "ugly-looking"
transfer encoding (punycode) rather than the "elegant" solution of
UTF-8.  The economic calculation was that clients that care about i18n
can implement IDNA, but that the IETF would not break the "social
contract" made with DNS software that was (and now still will be)
compatible with pre-IDNA standards.

I don't want to go into that whole debate right now, but I *am* of the  
opinion that the IETF is going exactly the wrong way there, exchanging  
short-term pain for long-term pain. That solution is *never* the right  
one.

I believe you're radically underestimating the IESG interest in that
backward compatibility, specifically in how it relates to mail gateways.

I increasingly have the impression that the people here but not on the  
USEFOR list do not actually *understand* the backward compatibility issues  
involved. For example, they seem to think that RFC 1036 actually describes  
how Usenet looks today. This is not true.

Further, I don't think anyone on the IESG will consider "UTF-8 in
headers") as an essential feature of "best proposed practice".  The IESG
will demand i18n of headers, but it will equally push for the lowest
impact way of achieving this.

Frankly, while it is certainly possible to argue about the relative merits  
of various solutions, I really doubt that any other solution has  
*significantly* lower impact than this one.

Kai Henningsen said:

Very shortly put, there is a long tradition on Usenet of passionately
hating 2047 encoding in at least some quarters. Note the word
"passionately"; it is not exaggerated. Long flamewars have been fought
about this before anyone even thought about USEFOR.

My only personal stance on this is that I consider it a serious,
inexcusable bug that 2821/2822 do not allow naked UTF-8 in headers.
Frankly, "we once made the mistake to spec 7 bits so this must remain
7 bit forever" is absolutely and inexcusably insane. But that is a
mail problem, not a news problem.

Not only is it a news problem, but it is the news problem that will
likely prevent the usefor draft from being approved.  What I don't get
is that news clients will need to understand 2047 and 2231 syntax
anyway, since such things are bound to leak in (from email if nowhere
else).

Kai's argument is quite similar to the arguments made in IDN for "just
use UTF-8".  A strong consensus was reached that the elegance of UTF-8
was a far lower priority than backward compatibility.  For RFC 2822, the
value of backward compatibility over elegance was even more obvious.  If

Frankly, it was totally nonobvious to me at the time; it was so before,  
and it still is. I strongly believe exchanging short-term pain for long- 
term pain is not a reasonable decision.

you care about wasting bits in RFC 2047 encoding of UTF-8, start

If you think this is about wasting bits, you haven't even started to  
understand the problems.

It is a combination of wasting programmer's times (without any hope of  
this ever getting better), of (as a consequence) introducing additional  
bugs, and of (partly as a consequence) irritating users.

Oh, and nobody please try to tell me that 2047 "just works". I've seen it  
break far too often for that.

2047 *must go away*, not be perpetuated forever. It is an abomination.  
(And really, the same arguments hold for 2231.) And now we get punicode.

The Internet is increasingly feeling like typical MS code - patches upon  
patches upon patches.

SONET overhead.  But if you want to deploy a standard in the IETF, focus
on backward compatibility.

Well, the problem is that it seems you cannot be backwards compatible to  
both current mail standards and current mail usage by current Usenet, as  
the two are *already* incompatible. Or at least not do that and actually  
have an even halfway sane method of i18n. Or at least I haven't seen any  
such proposal.

Obviously, abandoning the Usenet side of this means making a standard  
that's irrelevant to the real world, as Usenet will just ignore it the  
same way it actually does much of that today. Backwards compatibility to  
existing Usenet practice is a pretty absolute MUST. And specifically,  
compatibility with the installed server base is much more critical than  
for mail, because of the flood fill nature of Usenet. That limits  
implementation choices a lot.

Now, I personally am not a member of the crowd that starts flaming when  
they see 2047 in headers; I just rant a lot when I have to implement that  
mess (as I do right now).

So from my point of view, the really important i18n considerations are the  
following - maybe someone can come up with that missing ideal solution:

1. Must be able to support non-ASCII newsgroup names.
2. Because of that is how the installed base of servers works, newsgroup
   names (while in Usenet) *can* use non-7bit characters.
3. By the same argument, the identity relation on newsgroup names *must*
   work without needing any form of normalization (because no such form is
   deployed).
4. Names that fit in ASCII must still be in ASCII, for obvious reasons.
5. Because of moderated groups, news articles *will* be sent to moderators
   via mail.
6. Because of the installed base, this *will* (currently) happen in most
   cases without changing any header at all; we have a small chance of
   using attachments instead.
7. Moderators, in the vast majority, refuse to do anything complicated
   with these articles before injecting them to news servers. (See the
   relevant flamewar in the USEFOR archives.) Most of them use tools that
   are barely adequate to the job as-is, or at least that's the impression
   I get listening to them.
8. Because of crossposts, non-ASCII names *will* make their way to people
   who are not all that interested in groups with those names themselves.
   This must not make anything break.

While I dislike 2047, I do think a solution could demand it for any other  
fields; however, I do not see how it could possibly work for newsgroup  
names (Newsgroups: and Followup-To: header fields).

I have a simple question.  What can a UTF-8 subject header communicate
that an RFC 2047 one can't?  Other than inelegance, what's the downside
of 2047, when the upside is a huge increase in backward compatibility?

The downside is exactly lack of backwards compatibility. See above for  
details.

MfG Kai

<Prev in Thread] Current Thread [Next in Thread>