Re: Interpretation of RFC 2047


In <200210141709(_dot_)g9EH9c015818(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> 
Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

1. Where can encoded-words legitimately appear (RFC 2047 section 5).
-------------------------------------------------------------------

Interpretaion B:

It means you can use an encoded-word in
     any Subject                        )
     any Comments                       ) for which the field body is
     any extension message header field ) defined as '*text'
     any MIME body part field           )
     any X-header

Interpretation B is intended.

OK.

   Mail-Copies-To: =?ISO-8859-1?Q?Claus_F=E4rber?= 
<claus(_at_)faerber(_dot_)muc(_dot_)de>

only if that portion of the field is defined as 'text'


No, it is defined as 'phrase' (it is part of an 'address-list' in fact).

 so it would
be left to a different spec to define how 2047 applies to mail-copies-to.


I think that means we have to say explicitly that 'phrases' in that
header can be encoded as per RFC 2047, rather than the vaguer hand-waving
in the present Usefor draft. No problem with doing that.

Q: Is an email message containing that header-field (or should I say the
user agent which permitted it to be sent as an email) RFC
2047-compliant?

not sure if you mean before or after 2047 encoding.   2047 doesn't
forbid use in other structured fields, but neither does it require
use in structured fields that aren't listed in 2047.


I meant after RFC 2047 encoding.

   OTOH, both those views of Interpetation B seem to presuppose that the
   user agent was familiar with the syntax of Usefor.

no, it presupposes that when gateways take things that aren't email
messages (this includes usenet messages) and sends them via email,
it makes them compliant with email message standards at that time.
gateways have to track standards on both sides.  that's life.


And that's what worries me (and worries others on the Usefor list). It
seems that every time someone invents a new email (or news) extension,
lots of gateways, and user agents also, suddenly become out-of-date.
Whilst people who maintain gateways might conceivably "track the
standards", there is no way that the installed base of user agents can
possibly do so.

Here is another example from Usefor:

   Organization: =?ISO-8859-1?Q?Claus_F=E4rber Fabrik?=

Q: Is that one RFC 2047-compliant?

I'd say probably so.  though I don't know if it's formally defined anywhere,
organization is quite naturally 'text'.


It's formally defined as 'unstructured' (and 'unstructured' is essentially
as in RFC 2822). But again it seems that you are saying that gateways and
MUAs would have to be "tracking the standards" in order to legitimize it.

Anyway, as regards user agents that generate (or permit to be generated)
such newly-defined applications of RFC 2047 there is not any real
problem, since they would presumably not be generating such headers unless
they has already been upgraded to do so (and the issue of users inserting
'unknown' headers manually is a minor one).

The real problem is with agents receiving such headers:

2. Where are encoded-words required to be recognized (RFC 2047 section 6).
--------------------------------------------------------------------------

One would expect section 6 to require the recognition of anything that
was allowed to appear under section 5, but that seems not to be the case
because there is no mention of "extension message header fields".

In 6.1 I find:

   A mail reader must parse the message and body part headers according
   to the rules in RFC 822 to correctly recognize 'encoded-word's.

this is intended to apply to structured fields - the point is that
for structured fields you actually have to parse them to distinguish
between places where encoded words are valid (e.g. a word before a 
phrase) and places where they are not valid (e.g. a word in a local-part
or a domain).


It applies to unstructured fields also, since you need to known whether or
not you are allowed to decode "(=?ISO-8859-1?Q?Claus_F=E4rber?=)".

Would it not a sensible rule be to say that you should decode any occurrence
of =?<charset>?[BQ]?...?= (subject to the 76 character limit) in any
header provided:
    (a) it was immediately preceded by '(' or by CFWS
    (b) it was immediately followed by ')' or by CFWS
    (c) it was not contained within a quoted-string

That is actually sufficient to prevent it happening inside a domain
(except obs-domain), a local-part or a msgid. In fact, a large number of
existing user agents would already accept that, and more. Can anyone think
of any existing header that would actually break?

Anyway, I offer it as a thought for RFC 2047bis, when the time comes.

3. Is it possible to go further and introduce header-fields with
explicit 'encoded-word's in them, for example:

yes.  I don't exactly like doing that (there's a lot of overlap between
'token' and 'encoded-word', for instance) but it's a widespread practice.


You say "widespread". Can you give examples of actual extensions which
have done so?

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5