Here is a summary of the RFC 1342 bugs of which I'm aware, along with
my suggestions for fixes. Comments are invited. Also if there are
other problems that need to be mentioned, either with the specification
as written in RFC 1342 or its implementation, or its use, now is the
time to bring them up.
-Keith
----------------------------------------------------------------------
1. Title
Problem: title doesn't include the word "MIME"
Suggested fix: rename document to:
"MIME: Message Header Extensions for Non-Ascii Text"
(Maybe the docs should say "MIME: Part 1" and "MIME: part 2"?)
2. Set of allowable charsets:
Problem: RFC 1342 doesn't appear to allow IANA registered charsets
Problem: RFC 1342 doesn't appear to allow "extension" charsets
Suggested fix: allow any of the charsets defined in RFC1341bis for
use with the text/plain content-type, or any charset registered
with IANA for use with the text/plain content-type, or any extension
charset name beginning with "X-".
3. "Delete the following SPACE" rule doesn't work right.
Problem 1: RFC 1342 is supposed to allow long text to be represented
by multiple encoded-words, without having to split encoded-words on SPACE
boundaries. As currently written, a single SPACE or NEWLINE (not TAB)
following an encoded-word is ignored for display purposes, thus
allowing multiple encoded-words, separated by newlines, to be used
to represent a long header. However, RFC 822 requires a SPACE or
TAB character to follow the newline to continue the header.
Problem 2: As a consequence of the "delete the following SPACE"
rule, a header like this:
From: =?ISO-8859-1?Q?Keith=20Moore?= <moore(_at_)cs(_dot_)utk(_dot_)edu>
is displayed:
From: Keith Moore<moore(_at_)cs(_dot_)utk(_dot_)edu>
Suggested fix: Change the rule to: Any linear-white-space which
separates a pair of encoded-words is ignored for display purposes.
4. multi-byte character sets:
Problem: RFC 1342 did not consider multi-byte character sets,
and character sets with switching sequences (e.g. ISO-2022-JP).
Suggested fixes:
1. An encoded-word must encode an integral number of characters.
2. If a charset uses code-switching sequences to switch between "ASCII
mode" and other modes, each encoded-word implicitly begins in "ASCII
mode", and if necessary, must contain appropriate sequences such that
the charset interpreter is again in "ASCII mode" at the end of the
encoded-word.
5. conformance section:
RFC 1342 currently states:
A mail composing program claiming compliance with this specification
MUST ensure that any string of printable ASCII characters in a
message header that begins with "=?" and ends with "?=" be a valid
encoded-word.
There are many places in a header where such strings are legal, but where
an encoded-word isn't. For example, in an address:
To: =?foo?=(_at_)some(_dot_)where
We should not require the mail composer to quote the "=?foo?=" (since
this might even change the meaning of the address), and we don't want
this treated as an encoded-word.
Suggested fix:
Change the above "compliance" paragraph to read:
A mail composing program claiming compliance with this specification
MUST ensure that any string of printable ASCII characters in a "text"
entity within a header, or any "atom" within a "phrase", that begins
with "=?" and ends with "?=" be a valid encoded-word.
6. header folding:
Problem: RFC 1342 contains the sentence:
"Message header lines that contain one or more encoded-words should be
no more than 76 characters long."
Someone has suggested that this might be misconstrued to restrict
the length of an entire header field.
Suggested fix: Change to "Each line of a message header field that
contains an encoded-word should be no more than 76 characters long."
7. Problem: RFC 1342 doesn't say whether the encoding must be spelled
in upper case.
Suggested fix: Add a statement to the effect that encoding names
and charset names are case-independent.
----------------------------------------------------------------------