Re: Problems with RFC 2047


Jacob Palme wrote:

A message I sent recently contained the following subject:

Subject: Tid fär nästa möte med CMC-forskargruppen

It was sent as follows:

Subject: Tid =?iso-8859-1?Q?f=E4r?= =?iso-8859-1?Q?_?=
 =?iso-8859-1?Q?n=E4sta?= =?iso-8859-1?Q?_?=
 =?iso-8859-1?Q?m=F6te?= med CMC-forskargruppen


Subject: Tid =?ISO-8859-1?q?f=E4r_n=E4sta_m=F6te?= med CMC-forskargruppen

would be simpler. Whatever encoded your text did a pretty bad job
(it's legal, but awful).

And when I received the same message, the subject lookedlike follows:
Subject: Tid färnästamöte med CMC-forskargruppen


That looks like a decoding problem; line folding and whitespace
between encoded-words is supposed to be ignored for display, and the
line breaks are only between encoded-words.

The cause of the problems seems to be that RFC2047 makesthe subject much longer,


If done well, it's slightly longer (in characters) than the number
of octets in the original.  Best case depends on how long the
charset tag is (and language tag, if used (see RFC 2231)).

> and the sending mailer is of the

opinion that no header line should be longer than 78characters,


Actually, any header line containing any encoded-word is required
to be no longer than 76 octets (not counting the CRLF [RFC 2047
section 2]). The more compact version above is just shy of that
limit.  It's still possible in this case to fit everything on one
line, even with a language tag:

Subject: =?ISO-8859-1*se?q?Tid_f=E4r_n=E4sta_m=F6te_med_CMC-forskargruppen?=

which is exactly 76 octets.

> and thus splits the subject on multiple lines.

And the sending mailer had to encode each line with its
own =?iso-8859-1?Q?, making the problem worse.


Neither is neccessary, as shown above.

This, then, will be a problem with any encoding schemewhich increases the number of bytes in a header. And alsoUTF-8 can increase the number of bytes, and thus cause thesame problem.


It should not be a problem in any case; any header field may be
continued, Subject included, and user agents need to be able
to deal with that appropriately (e.g. by unfolding for display).

The problem would have been solved if RFC 2047 included afacility to split a header into multiple lines, which arenot to be received as multiple lines, like the "=" at theend of a line in Quoted-Printabe.


Actually that is already part of RFC 2047 (via a somewhat different
mechanism); see the examples and discussion in RFC 2047 section 8.