dan(_at_)dankohn(_dot_)com (Dan Kohn) wrote on 03.01.03 in
<138AA78F80DCE84B8EE424399FFBF9C904F9CB(_at_)exchange(_dot_)ad(_dot_)skymv(_dot_)com>:
Charles, as someone with absolutely no political or religious stakes in
this discussion, I'd like to point out to you that I've found Ned Freed
to be one of the most level-headed, open-minded IETF oldtimers.
Well, I cannot make that description match with his current mails. They
seem needlessly confrontational and ...
I've
seen him go out of his way to offer constructive criticism, and be far
more open to new ideas than many.
... perfectly free of anything that could be construed as constructive.
This is perhaps Ned's most important statement. At countless times in
the last 20 years, the IETF and it's predecessors have chosen backward
compatibility over the "efficient" solution. The (IESG-approved) IDNA
drafts epitomize this approach, by promulgating an "ugly-looking"
transfer encoding (punycode) rather than the "elegant" solution of
UTF-8. The economic calculation was that clients that care about i18n
can implement IDNA, but that the IETF would not break the "social
contract" made with DNS software that was (and now still will be)
compatible with pre-IDNA standards.
I don't want to go into that whole debate right now, but I *am* of the
opinion that the IETF is going exactly the wrong way there, exchanging
short-term pain for long-term pain. That solution is *never* the right
one.
I believe you're radically underestimating the IESG interest in that
backward compatibility, specifically in how it relates to mail gateways.
I increasingly have the impression that the people here but not on the
USEFOR list do not actually *understand* the backward compatibility issues
involved. For example, they seem to think that RFC 1036 actually describes
how Usenet looks today. This is not true.
Further, I don't think anyone on the IESG will consider "UTF-8 in
headers") as an essential feature of "best proposed practice". The IESG
will demand i18n of headers, but it will equally push for the lowest
impact way of achieving this.
Frankly, while it is certainly possible to argue about the relative merits
of various solutions, I really doubt that any other solution has
*significantly* lower impact than this one.
Kai Henningsen said:
Very shortly put, there is a long tradition on Usenet of passionately
hating 2047 encoding in at least some quarters. Note the word
"passionately"; it is not exaggerated. Long flamewars have been fought
about this before anyone even thought about USEFOR.
My only personal stance on this is that I consider it a serious,
inexcusable bug that 2821/2822 do not allow naked UTF-8 in headers.
Frankly, "we once made the mistake to spec 7 bits so this must remain
7 bit forever" is absolutely and inexcusably insane. But that is a
mail problem, not a news problem.
Not only is it a news problem, but it is the news problem that will
likely prevent the usefor draft from being approved. What I don't get
is that news clients will need to understand 2047 and 2231 syntax
anyway, since such things are bound to leak in (from email if nowhere
else).
Kai's argument is quite similar to the arguments made in IDN for "just
use UTF-8". A strong consensus was reached that the elegance of UTF-8
was a far lower priority than backward compatibility. For RFC 2822, the
value of backward compatibility over elegance was even more obvious. If
Frankly, it was totally nonobvious to me at the time; it was so before,
and it still is. I strongly believe exchanging short-term pain for long-
term pain is not a reasonable decision.
you care about wasting bits in RFC 2047 encoding of UTF-8, start
If you think this is about wasting bits, you haven't even started to
understand the problems.
It is a combination of wasting programmer's times (without any hope of
this ever getting better), of (as a consequence) introducing additional
bugs, and of (partly as a consequence) irritating users.
Oh, and nobody please try to tell me that 2047 "just works". I've seen it
break far too often for that.
2047 *must go away*, not be perpetuated forever. It is an abomination.
(And really, the same arguments hold for 2231.) And now we get punicode.
The Internet is increasingly feeling like typical MS code - patches upon
patches upon patches.
SONET overhead. But if you want to deploy a standard in the IETF, focus
on backward compatibility.
Well, the problem is that it seems you cannot be backwards compatible to
both current mail standards and current mail usage by current Usenet, as
the two are *already* incompatible. Or at least not do that and actually
have an even halfway sane method of i18n. Or at least I haven't seen any
such proposal.
Obviously, abandoning the Usenet side of this means making a standard
that's irrelevant to the real world, as Usenet will just ignore it the
same way it actually does much of that today. Backwards compatibility to
existing Usenet practice is a pretty absolute MUST. And specifically,
compatibility with the installed server base is much more critical than
for mail, because of the flood fill nature of Usenet. That limits
implementation choices a lot.
Now, I personally am not a member of the crowd that starts flaming when
they see 2047 in headers; I just rant a lot when I have to implement that
mess (as I do right now).
So from my point of view, the really important i18n considerations are the
following - maybe someone can come up with that missing ideal solution:
1. Must be able to support non-ASCII newsgroup names.
2. Because of that is how the installed base of servers works, newsgroup
names (while in Usenet) *can* use non-7bit characters.
3. By the same argument, the identity relation on newsgroup names *must*
work without needing any form of normalization (because no such form is
deployed).
4. Names that fit in ASCII must still be in ASCII, for obvious reasons.
5. Because of moderated groups, news articles *will* be sent to moderators
via mail.
6. Because of the installed base, this *will* (currently) happen in most
cases without changing any header at all; we have a small chance of
using attachments instead.
7. Moderators, in the vast majority, refuse to do anything complicated
with these articles before injecting them to news servers. (See the
relevant flamewar in the USEFOR archives.) Most of them use tools that
are barely adequate to the job as-is, or at least that's the impression
I get listening to them.
8. Because of crossposts, non-ASCII names *will* make their way to people
who are not all that interested in groups with those names themselves.
This must not make anything break.
While I dislike 2047, I do think a solution could demand it for any other
fields; however, I do not see how it could possibly work for newsgroup
names (Newsgroups: and Followup-To: header fields).
I have a simple question. What can a UTF-8 subject header communicate
that an RFC 2047 one can't? Other than inelegance, what's the downside
of 2047, when the upside is a huge increase in backward compatibility?
The downside is exactly lack of backwards compatibility. See above for
details.
MfG Kai