Re: Interpretation of RFC 2047

Interpretation B is intended.

OK.

   Mail-Copies-To: =?ISO-8859-1?Q?Claus_F=E4rber?= 
<claus(_at_)faerber(_dot_)muc(_dot_)de>

only if that portion of the field is defined as 'text'


No, it is defined as 'phrase' (it is part of an 'address-list' in fact).


okay, if it's defined as 'phrase' then 2047 should apply.  OTOH it's
a bit unreasonable to expect a mail user agent to understand the syntax
of a usenet message header field.

 so it would
be left to a different spec to define how 2047 applies to mail-copies-to.


I think that means we have to say explicitly that 'phrases' in that
header can be encoded as per RFC 2047, rather than the vaguer hand-waving
in the present Usefor draft. No problem with doing that.


being explicit seems like a good idea.

no, it presupposes that when gateways take things that aren't email
messages (this includes usenet messages) and sends them via email,
it makes them compliant with email message standards at that time.
gateways have to track standards on both sides.  that's life.


And that's what worries me (and worries others on the Usefor list). It
seems that every time someone invents a new email (or news) extension,
lots of gateways, and user agents also, suddenly become out-of-date.


True, but I don't know what to do about it.  Usenet and email are 
fundamentally different beasts serving different communities and they 
will inevitably diverge even if we tried to force them not to do so.
In a way it's too bad that they use such similar message formats -
it invites confusion for both humans and protocol implementations.

Whilst people who maintain gateways might conceivably "track the
standards", there is no way that the installed base of user agents can
possibly do so.


certainly they can, just as any other software can get updated.
it just takes a long time.

Here is another example from Usefor:

   Organization: =?ISO-8859-1?Q?Claus_F=E4rber Fabrik?=

Q: Is that one RFC 2047-compliant?

I'd say probably so.  though I don't know if it's formally defined anywhere,
organization is quite naturally 'text'.


It's formally defined as 'unstructured' (and 'unstructured' is essentially
as in RFC 2822). But again it seems that you are saying that gateways and
MUAs would have to be "tracking the standards" in order to legitimize it.


well, 2047 doesn't say so explicitly but it would be somewhat
reasonable for an implementation to treat any unrecognized field
as a user-defined field for the purpose of display.  that's one
reason that the rules for recognizing encoded-words are looser
than the rules for generating enocded-words.

One would expect section 6 to require the recognition of anything that
was allowed to appear under section 5, but that seems not to be the case
because there is no mention of "extension message header fields".

In 6.1 I find:

   A mail reader must parse the message and body part headers according
   to the rules in RFC 822 to correctly recognize 'encoded-word's.

this is intended to apply to structured fields - the point is that
for structured fields you actually have to parse them to distinguish
between places where encoded words are valid (e.g. a word before a 
phrase) and places where they are not valid (e.g. a word in a local-part
or a domain).


It applies to unstructured fields also, since you need to known whether or
not you are allowed to decode "(=?ISO-8859-1?Q?Claus_F=E4rber?=)".


strictly speaking, there's no parsing to an unstructured field.  you 
just treat it as *text.

Would it not a sensible rule be to say that you should decode any occurrence
of =?<charset>?[BQ]?...?= (subject to the 76 character limit) in any
header provided:
    (a) it was immediately preceded by '(' or by CFWS
    (b) it was immediately followed by ')' or by CFWS
    (c) it was not contained within a quoted-string


offhand I don't remember the reasoning for the current language.
I think part of the problem was that you can't expect either ( and ) 
or quotes to be meaningful except in a structured field.  e.g. There's
nothing at all wrong with

Subject: this"subject"has"five"quotes"! :)

also Q-encoded encoded-words are allowed to contain '(', ')' 
or '"' when they're used in unstructured fields. so it's tricky
to allow them to be used as delimiters.

however at this point I'd be reluctant to change 2047 - partially because
stability of the message format is perhaps more important and partially 
because we've been over all this before and not found a way to significantly
improve on the current situation.

3. Is it possible to go further and introduce header-fields with
explicit 'encoded-word's in them, for example:

yes.  I don't exactly like doing that (there's a lot of overlap between
'token' and 'encoded-word', for instance) but it's a widespread practice.


You say "widespread". Can you give examples of actual extensions which
have done so?


I wasn't saying that including explicit encoded-words was widespread,
but rather we have a lot of protocols defined using productions of the 
form

a = b / c

where 'b' is a subset of 'c' and/or vice versa.

in this specific case encoded-word is a subset of both atom and token.

Keith