ietf-822
[Top] [All Lists]

Re: Getting RFC 2047 encoding right

2003-12-09 15:06:48

The basic answer is that what you do with illegal input is generally not specified - but clearly you aren't expected to make the subject of the reply match the subject of the message being replied to in that case.

Right.

What I cannot see is how to make something reasonable, correct and fairly simple.

In most cases I have code that is right when the input is good, and not wrong when the input is bad. RFC 2047 just doesn't seem to make that simple.

Well for untagged text basically you just have to guess the charset. ISO-2022-* and UTF-8 can be distinguished from other charsets simply and fairly reliably, and you can make guesses at some of the others using heuristics. It's difficult to tune the heuristics, and subject lines are too brief for them to work really well. But I really don't see how RFC 2047 makes determining the charset label of untagged text any worse than it inherently is.

Suppose I want to answer with "subject: re: <original> <ticket id>", then I risk having two encoded-words separated only by whitespace, and must do magic in order to preserve that space.

why not just use an ASCII ticket id?

Why should I make "always ASCII" a requirement for that case, in code that otherwise allows all of Unicode?

For the same reason that you should probably avoid using some forms of email addresses even though they are perfectly valid - such as "Keith \"Mr. Cynic\" Moore"@cs.utk.edu - corner cases that are seldom seen often fail in practice.

If you want to be entirely reliable your code to detect ticket-ids has to be able to find them whether or not they're embedded in encoded-words. And it's not as if you can't put a ticket-id into an encoded-word, though (as you point out) you might have to encode %20 as the first character of that encoded-word.


<Prev in Thread] Current Thread [Next in Thread>