ietf-822
[Top] [All Lists]

Re: [ietf-822] utf8 messages

2014-08-13 18:23:10
Let me try one more time, since something isn't making it through.

I have three messages.  One message has an entirely 7bit header with 2047
encoded subject.  Another message is a 6532 message, with the subject in
utf8.  A third message is has a cp-1250 8bit subject.  There are two 8bit
bytes in the subject in both of the last two messages, and in the cp1250
case, those two bytes happen to also be a valid utf8 character.

We want to be able to parse all three of those and do so correctly.  We
know the third type is technically invalid, but we see millions of such
messages every day, dropping all of those would be a dis-service to our
users.  We currently see way more of such messages than we do of 6532
messages... though in practice, the most common charset now is utf-8, so I
guess those are now the same as 6532 messages that have leaked.

An example, we receive the Subject: Zdj\xc4\x99cia.  In UTF8,
that's Zdjęcia, in cp-1250, that's ZdjÄ™cia.

How do I tell which its supposed to be?  Our encoding detector chose
incorrectly.

And my apologies if bringing this to ietf-822 instead of the eai-wg list
was the wrong choice, it wasn't clear to me that the latter was still
active since the completion of the working group, and that with its
completion, there's no longer a "split" between the two, and that a concern
specifically about the format of email messages (which now includes 6532)
would belong on the list about such things.

I also seemed to have triggered some fear of a revolt which I don't
understand.

And I realize that to someone who spent years working on this that being
asked to retread these things is annoying.  Unfortunately, the resulting
RFCs don't include summarized information on why other possibly choices
were considered and rejected.  I'm unclear on how one is supposed to gain
this knowledge short of reading years worth of mailing list archives across
multiple lists... and even that doesn't help about things discussed
off-line or on other lists I don't even know to look for.

Brandon
_______________________________________________
ietf-822 mailing list
ietf-822(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-822
<Prev in Thread] Current Thread [Next in Thread>