Re: UTF-8 over RFC 2047 (Re: Call for Usefor to recharter)


In <1030116080856(_dot_)ZM26696(_at_)candle(_dot_)brasslantern(_dot_)com> "Bart 
Schaefer" <schaefer(_at_)brasslantern(_dot_)com> writes:

Advocates of UTF-8 in usenet headers want to take the same chance that
1341 senders took -- that their content might not get through undamaged
everywhere, particularly when gatewayed -- rather than adopt a workaround
of the kind 1342/1522/2047 represents.

This doesn't seem unreasonable on the face of it, but the difference is
that RFC 1341 had no effect on the mechanics of message transmission;
it introduced no changes in any existing headers, and no syntactically
incompatible changes in the extended headers.  The content might become
broken, but the process could not.

Changing existing headers to UTF-8 content *can* break the process; adding
new headers in UTF-8 *is not* syntactically compatible; and it *is not*
possible to predict where that breakage will occur or how the unexpected
syntax will be handled by software that conforms to older standards. Can
anyone refute any of those statements without resorting to probabilistic
arguments?  Using anything but 7-bit in headers is a calculated risk.


Actually, we have a means to predict the breakages rather well. There is
already a considerable amount of ("illegal") 8-bit headers being sent
through email. So we just need to see what is currently happening to it.

1. I doubt than any current sofware actually dumps core when that stuff
arrives.
2. Some transports may reject is with an error response. That seems fine.
3. Many transports just pass it through as-is (it is then someone else's
problem). I believe this is the most common case.
4. Some transports drop some offending characters (sendmail is the best
known example).
5. Some transports may truncate to 7 bits (I don't know of any examples).

Are there any other possibilities? Do we have a feel for their relative
preponderance?

Gateways have essentially the same possibilities.

For user agents, we can again observe what they actually do:

1. I doubt than any current UA actually dumps core when that stuff
arrives.
2. Some UAs may report an error, but display as best they are able.
3. Some UAs may report an error, and refuse to display. Do we know of any
examples?
4. Some UAs may discard the message silently. That would be bad, but do we
know of any?
5. Some UAs may display the human-readable texts (phrases, comments, etc)
as gobbledegook, but process other stuff (email addresses, etc) as
intended. I believe this is the most common situation.
6. Some UAs may display those headers as gobbledegook, and also mess up
stuff such as email addresses. That would be bad. Do we know of examples?
7. Some UAs may, by careful configuration or just by good luck, manage to
display (some of) it correctly.

Are there any other possibilities? Do we have a feel for their relative
preponderance?

As I said, there is enough of this stuff around that the above questions
should be answerable.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5