[Top] [All Lists]

Re: The transition to UTF-8 header fields

1999-02-11 19:01:54
Gateways to non-SMTP environments generally can not handle 8-bit
characters in headers. With can not I talk about anything between
crashing to non-delivery of that specific message, to just passing on
the data as being corrupt. The gateway itself can though handle QP or
Base64 CTE in the body and headers correctly.

Back in the early 90's we used to see problems with SMTP servers going into
infinite loops or getting desynchronized during the SMTP session when they were
presented with 8bit in message headers. As far as we could tell the former was
associated with a particular sendmail version/configuration -- we never
determined specificaly why it behaved this way, only that it did, and that the
sites where it happened were seriously pissed when they found their load
averages above 30 as a result of all the looping processes out there.

The latter case of session desynchronization was even more problematic in that
it didn't seem to be repeatable. I don't recall which MTA was associated with
the problem, assuming we ever figured that out.

There was also a desynchronization problem not specific to headers that
resulted from some SMTP servers having code in them to do TELNET options

We were forced as a result of this to stop emitting 8bit in message headers.
PMDF is 8-bit clean internally, but when customers yell at you for something
that's obviously a standards violation and which is causing problems, you don't
have a lot of choice in what you do. So we adopted a complex strategy, using
RFC 2047 encodings in cases where they are allowed and using mnemonics and such
in others, to avoid 8bit in message headers.

Of course once you stop doing something illegal you cannot tell if the
installed base is getting better at tolerating it or not. But  as it turns out
our 8bit downgrading logic wasn't perfect -- there were a couple of cases I
missed where 8bit could sneak through. And this has given us some good
indications that things are getting worse, not better.

Recently we've run into a number of cases where agents will simply refuse
messages with 8bit in the headers. Judging from the variety of error messages
there are at least three implementations out there that have this
characteristic.  One of them appears to be something associated with local
delivery in Cyrus, one of them appears to be associated with some gateway to or
from Lotus Notes, and the third we've never been able to identify. And
remember, the cases are coming to light when messages contain 8bit
in very peculiar places, since we catch the obvious ones. (And of course
now we're modified our code to even get the peculiar ones that used to
get through.)