ietf-822
[Top] [All Lists]

Re: Problems with RFC 2047

2003-03-09 13:58:10

Jacob Palme wrote:
A message I sent recently contained the following subject:

Subject: Tid fär nästa möte med CMC-forskargruppen

It was sent as follows:

Subject: Tid =?iso-8859-1?Q?f=E4r?= =?iso-8859-1?Q?_?=
 =?iso-8859-1?Q?n=E4sta?= =?iso-8859-1?Q?_?=
 =?iso-8859-1?Q?m=F6te?= med CMC-forskargruppen

Subject: Tid =?ISO-8859-1?q?f=E4r_n=E4sta_m=F6te?= med CMC-forskargruppen

would be simpler. Whatever encoded your text did a pretty bad job
(it's legal, but awful).

And when I received the same message, the subject looked like follows:

Subject: Tid fär nästa möte med CMC-forskargruppen

That looks like a decoding problem; line folding and whitespace
between encoded-words is supposed to be ignored for display, and the
line breaks are only between encoded-words.

The cause of the problems seems to be that RFC2047 makes the subject much longer,

If done well, it's slightly longer (in characters) than the number
of octets in the original.  Best case depends on how long the
charset tag is (and language tag, if used (see RFC 2231)).

> and the sending mailer is of the
opinion that no header line should be longer than 78 characters,

Actually, any header line containing any encoded-word is required
to be no longer than 76 octets (not counting the CRLF [RFC 2047
section 2]). The more compact version above is just shy of that
limit.  It's still possible in this case to fit everything on one
line, even with a language tag:

Subject: =?ISO-8859-1*se?q?Tid_f=E4r_n=E4sta_m=F6te_med_CMC-forskargruppen?=

which is exactly 76 octets.

> and thus splits the subject on multiple lines.
And the sending mailer had to encode each line with its
own =?iso-8859-1?Q?, making the problem worse.

Neither is neccessary, as shown above.

This, then, will be a problem with any encoding scheme which increases the number of bytes in a header. And also UTF-8 can increase the number of bytes, and thus cause the same problem.

It should not be a problem in any case; any header field may be
continued, Subject included, and user agents need to be able
to deal with that appropriately (e.g. by unfolding for display).

The problem would have been solved if RFC 2047 included a facility to split a header into multiple lines, which are not to be received as multiple lines, like the "=" at the end of a line in Quoted-Printabe.

Actually that is already part of RFC 2047 (via a somewhat different
mechanism); see the examples and discussion in RFC 2047 section 8.


<Prev in Thread] Current Thread [Next in Thread>