ietf-822
[Top] [All Lists]

RE: rfc2231 implementations?

2006-08-11 03:58:27

On Fri August 11 2006 04:52, Yuri Inglikov wrote:

Does anybody have insight why RFC2047 explicitly prohibits encoding parameter 
values and quoted strings? I am looking through archives but seems cannot 
find appropriate discussion. At some point using encoded words inside quoted 
strings was removed from the initial proposal and there should have been a 
good reason which I don't immediately see...

RFC 2047 specifically addresses non-protocol human-readable text (see RFCs
1958 and 2277 for discussions of protocol vs. text).  E.g. in
   From: Yuri Inglikov <Yuri(_dot_)Inglikov(_at_)microsoft(_dot_)com>
the display name "Yuri Inglikov" exists purely for human presentation; it
plays no role in message-related protocols (the protocols do use
"<Yuri(_dot_)Inglikov(_at_)microsoft(_dot_)com>", and RFC 2047 encoding is 
neither necessary
nor permitted there.

Likewise,
   Subject: RE: rfc2231 implementations?
plays no protocol role in message-related protocols; it merely carries "only
human-readable content" (RFC 2822) for presentation to the recipient.

Conversely, parameter values such as in
   Content-Type: text/plain;
     charset="us-ascii"
are usually composed of protocol keywords (as in the charset parameter in the
example above from your message) or other non-text (in the RFC 1958/2277
sense of "text") content.  RFC 2231 does provide for language-tagging in case
a parameter is used to convey some human-readable text value.

RFC 2047 encoding uses the '=' character for encoded octets, which might prove
troublesome for some implementations which parse parameters.  Indeed, some
implementations which illegally try to use RFC 2047 encoding in parameters are
notorious for botching the very same cruft that they generate.   RFC 2231
encoding uses the '%' character for encoded octets (N.B. the 2231 text contains
a hint that '=' was at one time intended for 2231 also; there is an erratum at
http://www.rfc-editor.org/cgi-bin/errata.pl ).

Ultimately I am curious why, instead of extending RFC2047 to parameters, we 
ended up with RFC2231?

Actually, 2231 updates 2047, specifically regarding specification of language
of encoded text (I suspect that it is not a coincidence that 2231 and 2277
appeared at about the same time).  Clearly, language is a characteristic of
human-readable text, but is not a characteristic of protocol keywords (which
are, by definition, language-independent tags).

   + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

Quoted-strings can appear in protocol content, e.g.
  <"foo:bar"@[123.45.67.89]>
RFC 822/2822 quoting mechanisms (qpair as well as quoted-string) suffice for
all legal protocol content, but do not provide for some human-readable text
(specifically any cases involving 8-bit code points, as those are illegal in
message fields and therefore must be encoded rather than raw).

   + An 'encoded-word' MUST NOT be used in parameter of a MIME
     Content-Type or Content-Disposition field, or in any structured
     field body except within a 'comment' or 'phrase'.

Comments and phrases (e.g. in the Keywords field as well as in display names)
are human-readable text content (as are all unstructured fields).  Structured
field content other than comments, phrases, and whitespace are protocol
elements (at least in all of the base message fields and MIME fields).  See
above regarding '=' vs. parameter/value parsing.

<Prev in Thread] Current Thread [Next in Thread>