Re: Getting RFC 2047 encoding right
2003-12-09 15:06:48
The basic answer is that what you do with illegal input is generally
not specified - but clearly you aren't expected to make the subject
of the reply match the subject of the message being replied to in
that case.
Right.
What I cannot see is how to make something reasonable, correct and
fairly simple.
In most cases I have code that is right when the input is good, and
not wrong when the input is bad. RFC 2047 just doesn't seem to make
that simple.
Well for untagged text basically you just have to guess the charset.
ISO-2022-* and UTF-8 can be distinguished from other charsets simply
and fairly reliably, and you can make guesses at some of the others
using heuristics. It's difficult to tune the heuristics, and subject
lines are too brief for them to work really well. But I really don't
see how RFC 2047 makes determining the charset label of untagged text
any worse than it inherently is.
Suppose I want to answer with "subject: re: <original> <ticket
id>", then I risk having two encoded-words separated only by
whitespace, and must do magic in order to preserve that space.
why not just use an ASCII ticket id?
Why should I make "always ASCII" a requirement for that case, in code
that otherwise allows all of Unicode?
For the same reason that you should probably avoid using some forms of
email addresses even though they are perfectly valid - such as "Keith
\"Mr. Cynic\" Moore"@cs.utk.edu - corner cases that are seldom seen
often fail in practice.
If you want to be entirely reliable your code to detect ticket-ids has
to be able to find them whether or not they're embedded in
encoded-words. And it's not as if you can't put a ticket-id into an
encoded-word, though (as you point out) you might have to encode %20 as
the first character of that encoded-word.
|
|