Re: RFC 2047 and gatewaying


Andrew Gierth wrote:

Or do you expect things like this:

----Original message----
Date: 11 Jan 2003 20:23:00 +0200
From: list-ietf-rfc822(_at_)faerber(_dot_)muc(_dot_)de 
(=?ISO-8859-1?Q?Claus_F=E4rber?=)
To: ietf-822(_at_)imc(_dot_)org
Subject: Re: RFC 2047 and gatewaying



to appear in encoded form as above?


[Hmmm, RFC 2822 strongly discourages comments in address fields]

Short answer:
I expect that the comment would be placed in the body which would
be quoted-printable encoded and in the iso-8859-1 charset, i.e. the
line with the encoded word in your example would be
>From: list-ietf-rfc822(_at_)faerber(_dot_)muc(_dot_)de (Claus_F=E4rber)
in the body text, which would display as expected in a MIME-compliant
user agent.  It would not affect the message transmission in any way.
Some user agents will do that automatically if the 8-bit raw iso-8850-1
text is entered, hopeully that is what Mozilla will do with this message
(so what appears above should *display* as what would appear verbatim in
the message body, while what appears below is what you would see if you
received a message with that attribution as described above, assuming
that Mozilla does the right thing)
>From: list-ietf-rfc822(_at_)faerber(_dot_)muc(_dot_)de (Claus_Färber)
That is not quite full decoding since it merely involves stripping the
RFC 2047 lead-in, glue, and trailer from the encoded-word text.

Longer answer:
That depends on a number of factors.  Decoding is only possible if
1a. the body has a suitable domain as specified in a
    Content-Transfer-Encoding field (8bit in this case, which further
    presumes that an 8bit transfer path is available)
or
1b. the body is suitably encoded and tagged via a Content-Transfer-Encoding
    field (e.g using quoted-printable encoding, which is similar to the Q
    encoding used in this case; in general it may be necessary to decode
    and re-encode to use this method)
and either
2a. the body charset is the same as is used in the header field(s)
or
2b. it is possible and practical to translate the charset in the encoding
    used in the header field(s) to the charset of the body.
and
3.  A Content-Type header field is provided with an appropriate charset
    parameter (except in the unlikely case that the default us-ascii
    charset applies to the entire body (including the decoded encoded-word)).

Some of the necessary condition(s) might not be satisfied in a given instance.
In the particular case given, the most generally compatible method would be
to use quoted-printable transfer encoding in the body with an iso-8859-1
charset (and corresponding header fields).  That is compatible with all
known mail and news transmission paths and gateways. It is possible, however,
that the recipient(s) might not be using a MIME-compliant reader, in which
case the quoted-printable encoding will be visible.  If that is a concern,
one could of course leave the RFC 2047 encoded-word verbatim, in which
case no special considerations need to be given to body encoding (at least
w.r.t. that copied content) in which case the 2047 encoded word probably
will not be decoded for display (it should not be decoded under those
conditions).  Under specific circumstances other alternatives from the above
list might be possible; for example if it is known that all possible mail
exchangers for all recipients support the 8BITMIME ESMTP extension and that
that extension is also supported by the sender's ESMTP client and any
SMTP relays which might be involved, 1a could be used provided that conditions
2 and 3 were also satisfied and that 8bit transfer were not prohibited by
the message itself (due to MIME requirements; see below).  In practice even
experts in mail protocols would find it difficult to be absolutely certain
of meeting those conditions except for trivial cases (e.g. somebody sending
over a LAN, with a single MX supporting ESMTP and 8BITMIME, and no mailing
list expanders); network and server outages or congestion (which may result
in use of an unusual MX alternative), mailing list expansion, gateways, etc.
all complicate matters.  A user who is ignorant of the mail protocols would
be unable to be certain of the necessary conditions.  A user agent could
not ascertain whether those conditions were satisfied either, also because of
the possibility of list expansion, etc.  The only safe courses of action
for user agent software in this example are
a. (trivial case) elide the comment. It is after all only a comment, not a
   display name
b. use quoted-printable encoding as described above
c. leave it as RFC 2047 encoded-words (the lazy programmer's approach)

It is important to note that the decoded (or not) content will be placed in
body text, not in message or MIME-part header fields, so with an appropriate
Content-Transfer-Encoding field in an appropriate context with a suitable
transmission path, that is acceptable.  Unlike the case for message and MIME-
part header fields, which can never have unencoded non-ASCII content (nor
can they contain ASCII NUL, a lone ASCII CR, or a lone ASCII LF).

It is also important to note that there are MIME restrictions on where the
various transfer encodings may be used. For example if the body text in
question were part of a composite MIME type, quoted-printable transfer
encoding would have to be applied at the innermost level (RFC 2045 section
6.4). See also RFC 2046 section 5.2.1.  If the message is part of a sufficiently
lengthy encapsulated message that it might be split into pieces, note that
RFC 2046 section 5.2.2 prohibits any transfer encoding other than 7bit for
the encapsulating message/partial type, and that in turn prohibits use of
8bit or binary transfer encoding in the encapsulated message.

While the individual may not strictly need to edit them, they
nevertheless become part of an entity that the user is editing.  Of
course, they are not then part of a header field, but if, say, I need
to reply to someone who sent a message which has been sent on to me by
a third party in this way, I have no real option but to copy+paste from
the message body back into the headers.


Since the only things that can be encoded by RFC 2047 (display names and
comments) are transparent to the transmission protocols, one *need* not
copy those parts *to* header fields at all; in the case of the diplay name
phrase of a mailbox specification, the address w/o the display name would
suffice, and in the case of a named group, any display name could be used.
And of course comments could be replaced by linear whitespace.

Expecting RFC2047 encoded-words never to be decoded except for display
is not realistic.


They should never be decoded in headers (or for SMTP "envelope" transactions)
for transmission.  RFC 2047 does not specifically apply to body text (other
than MIME-part header fields and DSN and MDN fields).

[note: Andrew, the In-Reply-To header field of your message did not contain
a message-id as required by RFC 2822 section 3.6.4]