ietf-822
[Top] [All Lists]

[ietf-822] utf8 messages

2014-08-11 15:46:03
In our recent launch of support for EAI, we noticed an issue with 6532
"utf8" messages.

As near as I can tell, there is nothing about a 6532 message which tells
you it is such a message... except the existence of 8bit characters in the
headers.  Ie, 7bit -> 5322, 8bit -> 6532.

Our problem is that this isn't actually true in practice.  Prior to
launching support for 6532 messages, we've already had to support
widespread use of 8bit messages that were not always in utf8.  Since these
typically didn't specify which charset they were in, we used a variety of
techniques including direct charset detection on such messages.

The problem we're having with 6532 messages, is that we moved from
explicitly identified charsets via 2047/etc mechanisms, to "its just
utf8"... and sometimes we mis-detect the utf8 as cp1250 or other encodings.

Now, we can work on improving our detection and maybe start biasing it to
utf8 or even just assuming utf8 for any 8bit message which is in
interchange valid utf8.  Anything we do there will result in some potential
for mistakes, of course.

This would all be solved if 6532 messages were actually denoted as such,
and I recall seeing at least one such X header used by another service
we've been interoperability testing with:
X-CM-HeaderCharset: UTF-8
CM no doubt standing for CoreMail, which is the software used:
X-Mailer: Coremail Webmail Server Version XT3.0.4 build
 20140526(27182.6409.6185) Copyright (c) 2002-2014 www.mailtech.cn coremail

Thoughts?  It looks like there was a i-Email/Header-Type originally, but
was removed early in the utf8smtp timeframe:
http://www.ietf.org/mail-archive/web/ima/current/msg01358.html
The general consensus for removal seemed to be "you'll know because it was
specified at SMTP time", "just look for 8bit" and "its bad to duplicate
data between the envelope and the headers".

Looks like it goes nearly to the beginning of the utf8smtp time frame:
http://www.ietf.org/mail-archive/web/ima/current/msg00079.html

It seems that the pre-existence of 8bit messages was not considered by
those who felt it wasn't necessary, as least as far as I've read in the
discussions (wow do I wish the mhonarc had been updated with an easier to
explore/read model)

Now, as hinted at in the consensus to remove such a marker from the draft,
we can certainly add such a header when composing 6532 messages or when we
receive any message via SMTPUTF8 for our own utility, but I would think
there would be some utility in such a marker being mutually understood and
shared.

Brandon
_______________________________________________
ietf-822 mailing list
ietf-822(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf-822
<Prev in Thread] Current Thread [Next in Thread>