ietf-822
[Top] [All Lists]

Re: Getting RFC 2047 encoding right

2003-12-12 03:43:13

Keith Moore writes:
I don't think I understand what you are saying. Would you really use "latin_1" as a charset name? Given that it's nonstandard, how could that be conservative?

I'd never do that. But code that simply copies the received subject would.

I assume the 0x80 is a non-break-space? Other than using a nonstandard charset, what is it that makes =?latin_1?q?=80?= a monstrosity?

Neither latin_1 nor 0x80 are defined, yet I've seen both in real life. 0x80 is not allocated in ISO 8859 character sets, it's a Microsoft extension for the euro sign. (Non-break-space is 0xA0.)

And how does this relate to the problem of not changing encoded-words from the subject message?

The MUA has a choice. Either is can be conservative in what it generates and liberal in what it accepts, or it can blindly generate whatever it accepts, or it can make a smart judgment.

The first is easiest to program. The second is only slightly harder and is what Charles said is the "obvious way", to which I took exception. The third is a great deal of work.

why would the decoder need to care?

See below.

to me this seems fairly simple:
...
if (strcmp (savemsg->subject, newmsg->subject) == 0)
       newmsg->subject = prepend_Re (origmsg->subject);
else
         newmsg->subject = encode_2047 (newmsg->subject);

       what am I missing?

If origmsg->subject is "=?latin_1?q?=80?=" and the user doesn't change the subject, newmsg->subject is "Re: =?latin_1?q?=80?=". If the decoder knows whether the string could have been generated by a reasonably conservative generator, that case can be avoided. A very tricky decision.

(And if I've injured the English language again, I apologize. I suppose I'm too old to learn new languages without forgetting bits of the ones I know, or mixing them up. Sad.)

--Arnt