Re: Interpretation of RFC 2047


On Oct 14,  1:01pm, Charles Lindsey wrote:
}
}    An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
}    in any Subject or Comments header field, any extension message header
}    field, or any MIME body part field for which the field body is
}    defined as '*text'. An 'encoded-word' may also appear in any
}    user-defined ("X-") message or body part header field.
} 
} That is ambiguous, depending on how you interpret the commas in the first
} sentence:

If you use interpretation "A" then extension and body part fields would
have to be replaced in their entirety by a _single_ encoded-word:

    "_An_ 'encoded-word' may replace any extension message header field"

As this is a practical impossibility, it seems pretty obvious that the
intended interpretation is "B", and in fact every implementation I've ever
seen appears to have chosen "B" ("may replace a 'text' _token_").

} [...] I if I had wanted
} Interpretation B, I would have written "... any extension message header
} field or any MIME body part field, for which the field body is defined
} as '*text'".

The intent is NOT to require that the field body is defined as '*text'.
The field body may have ANY definition that includes a 'text' token as
part of that definition.  However, ONLY the 'text' tokens may be replaced
by 'encoded-word'.  Other tokens in the field body MUST NOT be encoded.

}    Mail-Copies-To: =?ISO-8859-1?Q?Claus_F=E4rber?= 
<claus(_at_)faerber(_dot_)muc(_dot_)de>
} 
} Q: Is an email message containing that header-field (or should I say the
} user agent which permitted it to be sent as an email) RFC 2047-compliant?

Yes.

} A: Under Interpretation A, Yes. Because it is an extension-field which
}    satisfies the requirements of Rule 5(1).
}    
}    Under Interpretation B, No. Because the field body is not defined as
}    '*text'.

Wrong.  Under Interpretation B, yes, because the phrase is a text token
and may be encoded, but the address is not a text token and must not be
(and is not).
 
}    However, even with Interpretation B, it might get by under Rule 5(3)
}    because, under the Usefor syntax, it is within a 'phrase'.

That is the intent of 2047.
 
}    OTOH, both those views of Interpetation B seem to presuppose that the
}    user agent was familiar with the syntax of Usefor.

If the user agent is not familiar with the syntaxt of Usefor, it has no
business assigning any semantics at all to the Mail-Copies-To field, so
whether or not encoded words appear there is irrelevant.

}    Organization: =?ISO-8859-1?Q?Claus_F=E4rber Fabrik?=
} 
} Q: Is that one RFC 2047-compliant?
} 
} A: Yes, under both Interpretations A and B (though under B one might
}    wonder how the user agent was supposed to know that it was
}    unstructured).

The user agent is not supposed to know.  The user agent is not supposed
to look for or apply encodings/decodings inside header fields for which
it does not know the semantics.  All fields for which the semantics are
unknown are to be treated as unstructured upon receipt.
 
} In 6.1 I find:
} 
}    A mail reader must parse the message and body part headers according
}    to the rules in RFC 822 to correctly recognize 'encoded-word's.
} 
} Again, I see two interpretaions:
} 
} Interpretation C:
} 
} The wording "rules in RFC 822" means that only the headers explicitly
} defined in RFC 822 are required to be examined for the presence of
} 'encoded-word's.

As you noted, that's not a reasonable interpretation, because body part
headers are not described in RFC 822.

} Interpretation D:
} 
} The wording "rules in RFC 822" includes the rules for 'extension-field'
} and 'user-defined-field'.

This is the correct interpretation.

} Hence "must parse the message" means that the rules in the document
} defining the extension are to be applied.

Exactly.

} OTOH, Interpretation D seems to require that all user agents be
} magically aware of all new extension headers as soon as their defining
} documents are published.

No.  It requires only that user agents DO NOT ATTEMPT TO PARSE header
fields for which they do not know the semantics.

If you know what a header means, you may parse it and act on it.  If you
don't know, you should ignore it completely (pass it through unchanged,
hide it from the user, whatever).
 
} Thus the best interpretation I can place on section 6 is that a
} compliant mail reader MUST recognize and decode 'encoded-word's that
} occur in the headers explicitly defined in RFC 2822, and that it
} MAY/SHOULD/MUST/SOMETHING-ELSE recognize all 'encoded-word's produced by
} a compliant agent (as in section 5).

The intent of 2047 is that an application that chooses NOT to implement
2047 can still parse the resulting tokens according to their definition
in the appropriate document.  Thus most of the restrictions in 2047 are
on applications that GENERATE 2047 header fields, to assure that they
remain interoperable with non-2047-compliant applications.

Whether an application that receives a 2047 encoded-word chooses to decode
it is a quality of implementation issue.

I'm not going to attempt to address the document-authoring questions.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net