That's all fine and good. Unfortunately, what I thought Keith was
talking about was which one causes more problems in the *boundary
cases*, such as:
A) Which approach, 1 or 2, causes more brain damage if sent to an MUA
that has *NOT* been upgraded yet?
B) Which approach, 1 or 2, deals with weirdness such as trying to do
an UTF-8 style reply to an unflagged ISO8859-23 message?
This is close. The problem with using an extra header is that for
a variety of reasons, different parts of the message header are
generated by different entities. Replies are a good example -
consider a message thread which has had a number of people reply
to it, and has a long CC list. Each person's name in that CC
list may have been copied from a From header supplied by a
different user agent. Some of those names may be in 2047 format,
others in raw 8859/*, and others in UTF-8. A single header field,
particularly one which is not supported by everyone's user agent,
cannot handle all of those cases.
An effective strategy for displaying headers might be:
- if it's in 2047 format, decode and display per 2047
- if a phrase, *text, comment, or quoted-string is a valid UTF-8
string, display as UTF-8
(the first byte of each UTF-8 character has the length of the
character encoded in it, and each subsequent byte within that
character has certain bits set, so it's fairly unlikely that
something that looks like a valid UTF-8 string is actually a
string from some other charset)
- otherwise, display in the recipient's native or default character set
You will (almost certainly) still need a way to negotiate in
SMTP whether the next MTA can deal with 8bit UTF-8 headers,
and to downgrade to 2047 format if this is not possible.
But an extra header field doesn't help you there - by the time
you scan the message header looking for the extra field, you can
as easily scan the message header looking for 8bit characters.
Keith