Re: Interpretation of RFC 2047


In <200210171843(_dot_)g9HIha004948(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

And that's what worries me (and worries others on the Usefor list). It
seems that every time someone invents a new email (or news) extension,
lots of gateways, and user agents also, suddenly become out-of-date.

True, but I don't know what to do about it.  Usenet and email are 
fundamentally different beasts serving different communities and they 
will inevitably diverge even if we tried to force them not to do so.
In a way it's too bad that they use such similar message formats -
it invites confusion for both humans and protocol implementations.


Indeed, but this isn't just a Usenet problem (I merely used that example
because I have an immediate concern with it).

Suppose that someone has a bright idea for a new email header. For example

    Alternative-Address: =?ISO-8859-1?Q?Claus_F=E4rber?= 
<claus(_at_)faerber(_dot_)muc(_dot_)de>

with the meaning "if the To: address is undeliverable, then send the email
to the alternative". So he writes an internet-draft, it is discussed, and
eventually becomes a standards-track RFC. Naturally, it wants to use RFC
2047 in that 'phrase', and expects user agents to decode and display
"Färber". The problems it faces in getting recognized by user agents are
exactly the same as with my Mail-Copies-To header.

well, 2047 doesn't say so explicitly but it would be somewhat
reasonable for an implementation to treat any unrecognized field
as a user-defined field for the purpose of display.  that's one
reason that the rules for recognizing encoded-words are looser
than the rules for generating enocded-words.


Yes, but not looser enough. A reasonable approach would be:

If you are configured to understand this header, then parse it and deal
with encoded-words accordingly.

If you are not configured to understand this header (in which case there
is no semantic action you can possibly take, other than to display it),
then decode and display anything that looks remotely like an encoded word.

This might have the odd property that a non-upgraded agent might display
something that it would reject as erroneous and fail to display after it
had been upgraded, but I could live with that. I suspect that many real
world agents already do something like that, but it would be much nicer of
RFC 2047 had made that behaviour explicitly allowable, for the benefit of
future headers such as Alternative-Address or Mail-Copies-To.

It applies to unstructured fields also, since you need to known whether or
not you are allowed to decode "(=?ISO-8859-1?Q?Claus_F=E4rber?=)".

strictly speaking, there's no parsing to an unstructured field.  you 
just treat it as *text.


Actually, there is a parsing required, because an encoded word in an
unstructured header must have LWS (i.e. CFWS) on either side of it, whereas
it can also have '(' and ')' immediately next to it in a strutured header.

however at this point I'd be reluctant to change 2047 - partially because
stability of the message format is perhaps more important and partially 
because we've been over all this before and not found a way to significantly
improve on the current situation.


I think when the time comes to revise 2047, you need to take account of
what real implementations actually do, and try to move in their direction.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5