Tonight, somebody posted a patch for the Exmh MUA, described as follows:
This patch for mime.tcl attempts to separate the header decoding from
the header display, so that Mime_SavePiece can use the decode routines
to handle stuff like:
Content-Disposition: attachment; filename="=?iso-8859-1?Q?JVM=5FTask.c?="
It is not perfect yet because the character set specification is
silently ignored.
I voted against the patch on the grounds that RFC2231 specifies a different
syntax, and said a 2231-style patch would be encouraged. But it still
left me with a queasy feeling on two points:
1) Are any MUAs "in the wild" currently actually using the 2047-style encoding
in parameters rather than the 2231 syntax? If so, who are they, and who wants
to send the authors a note? ;)
2) RFC2183 assumes that a "filename" parameter is us-ascii, since RFC2045
didn't define an extension. RFC2231 added the syntax, making it legal to
specify a charset/language for a filename. However, I admit being totally
befuddled regarding the *semantics* of such a specification. On many
filesystems, the charset/lang don't matter, as long as the octets are
decoded to the originally intended binary stream. However, it's not
obvious what to do if, for instance, your system is UTF-8 or one of the
systems that has multiple IBM-nnn codepage locals, and the header shows up
as iso-2022-jp. Should the MUA attempt conversion, or leave it as an octet
stream?
Converting works better at the user-visible level - if the sender wanted
kanji in the filename, then a case can be made that converting to a
representation that displays the kanji (although at a different codepoint)
would benefit the user. On the other hand, it will totally hose any
automated 'manifest checkers' that go looking for a specific filename.
I'm sure there's other issues I've missed here...
--
Valdis Kletnieks
Computer Systems Senior Engineer
Virginia Tech
pgp9pJuuRUck8.pgp
Description: PGP signature