RFC 1342 bugs & suggested fixes

Here is a summary of the RFC 1342 bugs of which I'm aware, along with
my suggestions for fixes.  Comments are invited.  Also if there are
other problems that need to be mentioned, either with the specification
as written in RFC 1342 or its implementation, or its use, now is the 
time to bring them up.

-Keith

----------------------------------------------------------------------
1.  Title

    Problem: title doesn't include the word "MIME"

    Suggested fix: rename document to:

    "MIME: Message Header Extensions for Non-Ascii Text"

    (Maybe the docs should say "MIME: Part 1" and "MIME: part 2"?)

2.  Set of allowable charsets:

    Problem: RFC 1342 doesn't appear to allow IANA registered charsets
    Problem: RFC 1342 doesn't appear to allow "extension" charsets

    Suggested fix: allow any of the charsets defined in RFC1341bis for 
    use with the text/plain content-type, or any charset registered
    with IANA for use with the text/plain content-type, or any extension
    charset name beginning with "X-".

3.  "Delete the following SPACE" rule doesn't work right.

    Problem 1:  RFC 1342 is supposed to allow long text to be represented 
    by multiple encoded-words, without having to split encoded-words on SPACE
    boundaries.  As currently written, a single SPACE or NEWLINE (not TAB)
    following an encoded-word is ignored for display purposes, thus
    allowing multiple encoded-words, separated by newlines, to be used 
    to represent a long header.  However, RFC 822 requires a SPACE or 
    TAB character to follow the newline to continue the header.

    Problem 2:  As a consequence of the "delete the following SPACE"
    rule, a header like this:

    From: =?ISO-8859-1?Q?Keith=20Moore?= <moore(_at_)cs(_dot_)utk(_dot_)edu>

    is displayed:

    From: Keith Moore<moore(_at_)cs(_dot_)utk(_dot_)edu>

    Suggested fix:  Change the rule to:  Any linear-white-space which
    separates a pair of encoded-words is ignored for display purposes.

4. multi-byte character sets:

    Problem: RFC 1342 did not consider multi-byte character sets,
    and character sets with switching sequences (e.g. ISO-2022-JP).

    Suggested fixes:  

    1.  An encoded-word must encode an integral number of characters. 

    2.  If a charset uses code-switching sequences to switch between "ASCII
    mode" and other modes, each encoded-word implicitly begins in "ASCII 
    mode", and if necessary, must contain appropriate sequences such that 
    the charset interpreter is again in "ASCII mode" at the end of the 
    encoded-word.

5.  conformance section:

    RFC 1342 currently states:

    A mail composing program claiming compliance with this specification
    MUST ensure that any string of printable ASCII characters in a
    message header that begins with "=?" and ends with "?=" be a valid
    encoded-word.

    There are many places in a header where such strings are legal, but where
    an encoded-word isn't.  For example, in an address:

    To: =?foo?=(_at_)some(_dot_)where

    We should not require the mail composer to quote the "=?foo?=" (since
    this might even change the meaning of the address), and we don't want
    this treated as an encoded-word.

    Suggested fix:

    Change the above "compliance" paragraph to read:

    A mail composing program claiming compliance with this specification
    MUST ensure that any string of printable ASCII characters in a "text"
    entity within a header, or any "atom" within a "phrase", that begins
    with "=?" and ends with "?=" be a valid encoded-word.

6.  header folding:

    Problem: RFC 1342 contains the sentence:  

   "Message header lines that contain one or more encoded-words should be
    no more than 76 characters long."

    Someone has suggested that this might be misconstrued to restrict
    the length of an entire header field.

    Suggested fix:  Change to "Each line of a message header field that
    contains an encoded-word should be no more than 76 characters long."


7.  Problem: RFC 1342 doesn't say whether the encoding must be spelled
    in upper case.

    Suggested fix:  Add a statement to the effect that encoding names
    and charset names are case-independent.

----------------------------------------------------------------------