ietf-822
[Top] [All Lists]

Re: Getting RFC 2047 encoding right

2003-12-12 05:12:27


On Dec 12, 2003, at 5:42 AM, Arnt Gulbrandsen wrote:

Keith Moore writes:
I don't think I understand what you are saying.  Would you really use
"latin_1" as a charset name?  Given that it's nonstandard, how could
that be conservative?

I'd never do that. But code that simply copies the received subject would.

I see. Well, I don't think you're expected to fix that. Furthermore, if your code doesn't know what "latin_1" is, I don't see how you could translate it into anything better anyway. Leaving it as-is at least allows for the possibility that it's valid, and your software just hasn't learned about it yet.

The MUA has a choice. Either is can be conservative in what it generates
and liberal in what it accepts, or it can blindly generate whatever it
accepts, or it can make a smart judgment.

To me, being conservative in what it generates means not decoding and reencoding things it doesn't understand from a message being replied-to - it means keeping things as they are.

if (strcmp (savemsg->subject, newmsg->subject) == 0)
       newmsg->subject = prepend_Re (origmsg->subject);
else
         newmsg->subject = encode_2047 (newmsg->subject);

       what am I missing?

If origmsg->subject is "=?latin_1?q?=80?=" and the user doesn't change
the subject, newmsg->subject is "Re: =?latin_1?q?=80?=". If the decoder
knows whether the string could have been generated by a reasonably
conservative generator, that case can be avoided. A very tricky
decision.

If you don't know what latin_1 is, then you can't make any sense of the 0x80 anyway. You might as well keep it in the reply.

There are almost certainly still some unregistered charsets in use in some communities, and some new charsets are being added from time to time. You can't really expect your software to be aware of all of them. It makes sense to make your software tolerant of charsets it doesn't understand yet.