Re: RFC 2047 and gatewaying


Claus writes:

It's easy for a user agent to translate the changed data back into
transmission format.


To the extent that the user sticks to fields recognized by the MUA, such
as Subject, it's certainly possible to do that translation.

The Subject line ends up being created by somebody, encoded as per RFC
2047, decoded for editing in a followup, encoded again, decoded again
for editing in someone else's followup, etc.

The second encoding often has very different bytes from the first. If
the first encoding uses KOI8-R, for example, then it's hardly a surprise
for the second encoding to use UTF-8. But that's a failure when the two
messages are handled by popular message processors that match Subject
lines according to the 822/1036 format.

You could argue that implementations should check for unchanged Subject
lines and use the original encoding if possible (which also means that
implementations have to be structured to keep the original encoding
around instead of hiding RFC 2047 in the message-I/O module). But most
implementations don't do this (even if they're structured that way).

Of course, Subject is only part of the RFC 2047 mess. I noticed that a
recent message to this list has

   In-Reply-To: <001601c2bc0a$2844d650$0200000a(_at_)DAVE>
      ("David Barr"'s message of "Tue, 14 Jan 2003 14:18:47 -0600")

   David Barr <barr(_at_)visi(_dot_)com> writes:

in the header and body. Does all the reading and writing software work
correctly with an RFC 2047 name instead of David Barr? What happens when
names are copied to and from address books? Where is the line drawn
between RFC 2047 in header fields and UTF-8 in an LDAP database? Exactly
which data structures are supposed to use the RFC 2047 encoding?

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago