On Dec 12, 2003, at 5:42 AM, Arnt Gulbrandsen wrote:
Keith Moore writes:
I don't think I understand what you are saying. Would you really use
"latin_1" as a charset name? Given that it's nonstandard, how could
that be conservative?
I'd never do that. But code that simply copies the received subject
would.
I see. Well, I don't think you're expected to fix that. Furthermore,
if your code doesn't know what "latin_1" is, I don't see how you could
translate it into anything better anyway. Leaving it as-is at least
allows for the possibility that it's valid, and your software just
hasn't learned about it yet.
The MUA has a choice. Either is can be conservative in what it
generates
and liberal in what it accepts, or it can blindly generate whatever it
accepts, or it can make a smart judgment.
To me, being conservative in what it generates means not decoding and
reencoding things it doesn't understand from a message being replied-to
- it means keeping things as they are.
if (strcmp (savemsg->subject, newmsg->subject) == 0)
newmsg->subject = prepend_Re (origmsg->subject);
else
newmsg->subject = encode_2047 (newmsg->subject);
what am I missing?
If origmsg->subject is "=?latin_1?q?=80?=" and the user doesn't change
the subject, newmsg->subject is "Re: =?latin_1?q?=80?=". If the decoder
knows whether the string could have been generated by a reasonably
conservative generator, that case can be avoided. A very tricky
decision.
If you don't know what latin_1 is, then you can't make any sense of the
0x80 anyway. You might as well keep it in the reply.
There are almost certainly still some unregistered charsets in use in
some communities, and some new charsets are being added from time to
time. You can't really expect your software to be aware of all of
them. It makes sense to make your software tolerant of charsets it
doesn't understand yet.