ietf-822
[Top] [All Lists]

Re: Interpretation of RFC 2047

2002-12-13 10:25:52

Charles Lindsey wrote:

Would it not a sensible rule be to say that you should decode any occurrence
of =?<charset>?[BQ]?...?= (subject to the 76 character limit) in any
header provided:
    (a) it was immediately preceded by '(' or by CFWS
    (b) it was immediately followed by ')' or by CFWS
    (c) it was not contained within a quoted-string

(d) it was not part of a MIME parameter (RFC 2047 expressly forbids 2047
    encoding in MIME parameters; RFC 2231 provides a mechanism for parameters
    and also extends 2047 to include language tags)

... and more (see below)

Actually, there is a parsing required, because an encoded word in an
unstructured header must have LWS (i.e. CFWS) on either side of it, whereas
it can also have '(' and ')' immediately next to it in a strutured header.

That's not accurate: first, LWS and CFWS are different "(a) =?se2?q?x?="
(quotes for legibility only) is legal whereas " (a)=?se2?q?x?=" is not;
both have CFWS immediately before what looks like an encoded-word, but
only the former has LWS immediately before an encoded-word. And there
are many issues with parentheses; ")=?se2?q?x?=(" in a structured
header which contains no other parentheses does not contain an
encoded-word.

Other areas that immediately come to mind are:
1. RFC 2557 Content-Location, which permits URIs, which in turn (RFC 2396)
   permit parentheses.  That's in a structured field, but a URI, not a
   comment. [there are issues with 2557 and CFWS vs. the URIs, and these
   have been discussed on the MHTML list]
2. RFC 2533 "filters" have more nested parentheses than a technical paper
   at a LISP convention.  They're not comments and they appear in
   structured MIME extension headers (RFC 3297 Content-Alternative,
   RFC 2912 Content-Features).
3. URIs can also appear in other MIME extension headers; IIRC one of the
   RFCs provides for a URI in a parameter.
4. URIs also appear in headers which are not MIME extension headers, e.g.
   many of the List- headers.

I'm not certain, but I don't believe that the filter syntax permits anything
resembling 2047 encoding.  URIs probably do, but again, I haven't checked
thoroughly.

Misinterpreting something as encoded when it is in fact not an encoded-word
can have consequences.  Even if not changed in the protocol, but only for
display, there could be problems e.g. with cut-and-paste of URIs.

Strictly speaking, one can only decode if one knows the relevant header
syntax.  Display is a relatively minor issue, subject to the above
caveat.  But transformations by gateways may result in fouling up content
beyond all recognition unless the header syntax is known.  Ideally,
gateways shouldn't decode encoded-words -- if they're left in encoded
form there is no chance that they'll be garbled, which is the likely
outcome unless strict syntax of headers is known and applied rigorously.



<Prev in Thread] Current Thread [Next in Thread>