Keith Moore writes:
I don't think I understand what you are saying. Would you really use
"latin_1" as a charset name? Given that it's nonstandard, how could
that be conservative?
I'd never do that. But code that simply copies the received subject would.
I assume the 0x80 is a non-break-space? Other than using a
nonstandard charset, what is it that makes =?latin_1?q?=80?= a
monstrosity?
Neither latin_1 nor 0x80 are defined, yet I've seen both in real life.
0x80 is not allocated in ISO 8859 character sets, it's a Microsoft
extension for the euro sign. (Non-break-space is 0xA0.)
And how does this relate to the problem of not changing encoded-words
from the subject message?
The MUA has a choice. Either is can be conservative in what it generates
and liberal in what it accepts, or it can blindly generate whatever it
accepts, or it can make a smart judgment.
The first is easiest to program. The second is only slightly harder and
is what Charles said is the "obvious way", to which I took exception.
The third is a great deal of work.
why would the decoder need to care?
See below.
to me this seems fairly simple:
...
if (strcmp (savemsg->subject, newmsg->subject) == 0)
newmsg->subject = prepend_Re (origmsg->subject);
else
newmsg->subject = encode_2047 (newmsg->subject);
what am I missing?
If origmsg->subject is "=?latin_1?q?=80?=" and the user doesn't change
the subject, newmsg->subject is "Re: =?latin_1?q?=80?=". If the decoder
knows whether the string could have been generated by a reasonably
conservative generator, that case can be avoided. A very tricky
decision.
(And if I've injured the English language again, I apologize. I suppose
I'm too old to learn new languages without forgetting bits of the ones
I know, or mixing them up. Sad.)
--Arnt