[Top] [All Lists]

Re: The transition to UTF-8 header fields

1999-02-11 12:33:52
Keith Moore writes:
No, but I fail to see how having an extra header (or not) would
affect that outcome.

MUA implementors are generally willing to change the interpretation of a
message _if_ a new header field is supplied.

This doesn't affect current users. The important question is whether the
likely future benefits outweigh the implementation and deployment costs.

Without the header field it's more difficult to evaluate the current
costs. In some cases such a change would impose huge extra costs on
current users; if you ask implementors to do that then you lose.

Again, I'm not saying that the extra costs of what you're proposing are
obviously larger than the benefits. But it _is_ obvious how the costs
_could_ drastically slow down the transition. More study is required.

A similar argument was made in MIME for the MIME-Version header.   
Based on our experience with that, I'd say it has been more trouble 
than it was worth.  

Most implementors, I think, would accept UTF-8 without an extra header
if the RFC provided an adequate discussion of the issues and explained
why it isn't expected to be a problem.  But to some degree this  analysis
might need to be based on assumptions which character sets are used without 
encoding by the installed base.  The koi-8 charset, for instance, might 
be a problem if it's widely used in 8bit headers.  So I agree that more 
study would be useful.

Part of the additional study that you want could consist of an Experimental 
period for the RFC, in which implementors were encouraged to give feedback 
about the negative impact of using UTF-8 without an additional header.  

We will almost certainly need a UTF8 SMTP option anyway,

Why? Explain the benefits and costs. Remember that the argument for MTA
Q-P conversion failed in the real world because it wildly understated
the costs of Q-P support.

At the time MIME was defined there were reports of several MTAs in 
deployment which could not handle 8bit characters in the message 
body - some would corrupt the message; others would get wedged or 
crash.   Though the "downgrade to quoted-printable" advice seems
overly conservative today, it was right for the time.  I don't
think it's fair to brand this as a failure, because it slowed the
deployment of "just send 8", allowing MTA vendors to fix their
8bit body problems with minimal disruption.

But 8bit headers are more difficult.  These days, I don't hear many 
reports about 8bit body problems (other than some MTAs which refuse 
to relay 8bit mail to an MTA which doesn't support 8BITMIME).  
But I'm still hearing reports of MTAs failing in 'catastrophic' ways
when presented with 8bit headers.  It's hard to pin these down because
I'm just being told about them - I'm not seeing them firsthand.  But 
we need to be very careful about deploying something which might 
cause a significant number of failures in the mail system.

Data points: qmail is 8-bit-clean. sendmail (8.8.0 and above) removes
bytes 128-159. The Cyrus IMAP toaster, apparently, bounces any message
with an 8-bit header field. Any other reports?

Current versions aren't nearly as relevant as what is deployed.
And like Y2K bugs, having a short list of things known not to break
doesn't give you confidence that other things won't break.  If you 
really want to know whether it is safe to just send 8 bits in headers, 
then you need to do a survey of all versions of all mail software 
with significant deployment.  This is beyond the resources of IETF.  
Instead, we go with the consensus of a set of experts.