[Top] [All Lists]

Re: The transition to UTF-8 header fields

1999-02-06 16:41:05
Charles Lindsey writes:
And does it
barf when the nasty characters are in phrases or comments in those

sendmail, starting in version 8.8.0, simply drops bytes 128-159 from all
incoming header fields.

The underlying problem is that sendmail's code is far from 8-bit-clean.
sendmail's entire rewriting mechanism works with a string format that
assigns special meanings to bytes 129, 130, 133, etc. It can't even deal
with an 8-bit name in the From line.

This could actually turn out to be a Good Thing in that it will
slow down those who want to propagate 8-bit headers until they
are properly defined and their implications are understood. 

Well any message that currently uses unencoded ISO-8859-1 (except within
the protection of a Content-Type specifying that charset) is not
conforming already, so we are not necessarily obliged to recognise it

I'm concerned with the features that users rely on, not just the
features guaranteed by the IETF.

I don't want to gratuitously break things unless they cause harm.  But 
if users rely on nonstandard features, they deserve whatever they get.  
This is true no matter whether those features are propagated by sendmail,
qmail, or a 300-pound gorilla in Redmond.

MUAs with poor 8-bit support and without any European users can get away
with broad rules such as

   Interpret all 8-bit characters in the header as UTF-8.

But other implementors will be much happier with

   Interpret all 8-bit characters in the header as UTF-8 _if_ you see
   the following special header field: ...

This avoids creating any new problems for current users.

Wrong.  Go back and do the case analysis.