RE: Erroneous content-type: Text/plain; charset=us-ascii


I don't have a stomach for standardized error processing... and I doubt you 
could do this right anyway.  The right thing to do is to fix the offending 
mailers.  Doing the best they can in the face of brokeness is a good 
competitive differentiator for clients.

I assume you get predominately 8859-1 mislabled as US-ASCII based on the fact 
you predominantely communicate with folks in a Western European language.  I 
bet someone using say Cyrilc would have a predominant error condidition of a 
different 8859 varient mis-labled as US-ASCII.

Why would we assume the broken mailers would any more ensure that characters 
are in 8859-1 than they would ensure that what is labled is correct?   What if 
the standard, or IETF in a BCP, says "treat this error as X" and a mailer used 
in a different region made a different error? 

Greg V.

-----Original Message-----
From: Jacob Palme [mailto:jpalme(_at_)dsv(_dot_)su(_dot_)se]
Sent: Thursday, September 02, 2004 6:18 AM
To: ietf-822(_at_)imc(_dot_)org
Cc: DSV KOM Development Group
Subject: Erroneous content-type: Text/plain; charset=us-ascii



I often get incorrect incoming e-mails, which have in their
headers either

Content-type: Text/plain; charset=us-ascii
or only
Content-type: Text/plain

even though the text is in effect ISO-8859-1.

I think IETF should maybe consider specifying what a mailer 
should do if it gets characters > 127 in e-mail bodies 
labelled "us-ascii" or with no charset label.

At present, the standards only say that mailers should 
handle such messages as "us-ascii". But since us-ascii does 
not say what you should do with characters > 127, present 
standards actually do not say what a mailer should do with 
such messages.

I think a mailer should in fact treat such messages as 
either "iso-8859-1" or as a local character set specified 
by the user in preferences.

In fact, many mailers already do this. So my suggestions is 
mainly that we should standardize what most mailers already 
do. There is a need to standardize this anyway, since there 
are mailers, for example the one I am using, Eudora 6.1 for 
Macintosh, which do not treat such messages as 8859-1. 
Eudora 6.1 handles such messages as if their charset had 
been Mac Roman, which is not very useful since almost no 
actual incoming messages are in Mac Roman without being 
labelled with this character set.

There is one argument for doing what Eudora 6.1 does.
This is the implicit rule "do not munge", which says that
if you get illegal content, you should not try to correct
it, but just pass it along unchanged.
-- 
Jacob Palme <jpalme(_at_)dsv(_dot_)su(_dot_)se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/