Re: Getting RFC 2047 encoding right



On Dec 12, 2003, at 5:42 AM, Arnt Gulbrandsen wrote:

Keith Moore writes:

I don't think I understand what you are saying.  Would you really use
"latin_1" as a charset name?  Given that it's nonstandard, how could
that be conservative?

I'd never do that. But code that simply copies the received subjectwould.

I see. Well, I don't think you're expected to fix that. Furthermore,if your code doesn't know what "latin_1" is, I don't see how you couldtranslate it into anything better anyway. Leaving it as-is at leastallows for the possibility that it's valid, and your software justhasn't learned about it yet.

The MUA has a choice. Either is can be conservative in what itgenerates
and liberal in what it accepts, or it can blindly generate whatever it
accepts, or it can make a smart judgment.

To me, being conservative in what it generates means not decoding andreencoding things it doesn't understand from a message being replied-to- it means keeping things as they are.

if (strcmp (savemsg->subject, newmsg->subject) == 0)
       newmsg->subject = prepend_Re (origmsg->subject);
else
         newmsg->subject = encode_2047 (newmsg->subject);

       what am I missing?


If origmsg->subject is "=?latin_1?q?=80?=" and the user doesn't change
the subject, newmsg->subject is "Re: =?latin_1?q?=80?=". If the decoder
knows whether the string could have been generated by a reasonably
conservative generator, that case can be avoided. A very tricky
decision.

If you don't know what latin_1 is, then you can't make any sense of the0x80 anyway. You might as well keep it in the reply.

There are almost certainly still some unregistered charsets in use insome communities, and some new charsets are being added from time totime. You can't really expect your software to be aware of all ofthem. It makes sense to make your software tolerant of charsets itdoesn't understand yet.