nmh-workers
[Top] [All Lists]

[Nmh-workers] RFC 2047 vs RFC 2231 encoding for MIME parameters

2016-09-28 10:40:52
So, some backstory and explanation.

For representing 8-bit characters in email headers, the encoding used
is defined in RFC 2047.  You've probably seen that at some point; it
looks like:

        =?UTF-8?Q?Hi!_=F0=9F=92=A9?=

Those can be used in only a few places: in "text" in a Subject or Comment
header, a MIME body part field where the field body is defined as "*text"
(such as Content-Description ... and really, that's the only one), or
as a replacement for a "word" in an email address in a place where an
email address exists.

Specifically, RFC 2047 says:

+ An 'encoded-word' MUST NOT be used in parameter of a MIME
  Content-Type or Content-Disposition field, or in any structured
  field body except within a 'comment' or 'phrase'.

For MIME parameters, they used an alternate encoding defined by
RFC 2231.  That looks like:

        name*=utf-8''Hi!%F0%9F%92%A9

(There's more if you have a long parameter name, but you get the idea).

So, incompatible encoding.  Fine.  Nmh has supported RFC 2047 encoding
for _decode_ for a long time; for 1.6 we added 2047 encoding, and support
for RFC 2231 for both encoding and decoding.

However ... nothing is ever simple.  Specifically, there was a patch
contributed (but later reverted) that enabled RFC 2047 decoding for
some MIME parameters.

The exact issue is that some MUAs will use RFC 2047 encoding
for a filename that contains 8-bit characters when creating a
Content-Disposition field.  This was a problem with older versions of
Outlook (like pre-2007), Lotus/IBM Notes (which I was surprised to
discover was still a thing), but most troublesome, RFC 2047 encoding is
ALSO used when you attach a filename with 8-bit characters when you use
the web interface for Gmail.  If you Google "rfc 2047 vs rfc2231" you
can get an idea of what happened (Chrome and Thunderbird support it for
decode, and Google uses that as justification for keeping it ... and
Chrome and Thunderbird don't want to disable that support, because Gmail
still uses it.  Argh).

I am torn as to what to do here.  It feels somehow wrong to support this
for decode natively, but I'm not completely convinced of that.  We have
a number of email programs that get this wrong, including a very popular
one.  This might be something perfect for mhfixmsg to deal with.  What
do others think?

--Ken

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>