I don't have a stomach for standardized error processing... and I doubt you
could do this right anyway. The right thing to do is to fix the offending
mailers. Doing the best they can in the face of brokeness is a good
competitive differentiator for clients.
I assume you get predominately 8859-1 mislabled as US-ASCII based on the fact
you predominantely communicate with folks in a Western European language. I
bet someone using say Cyrilc would have a predominant error condidition of a
different 8859 varient mis-labled as US-ASCII.
Why would we assume the broken mailers would any more ensure that characters
are in 8859-1 than they would ensure that what is labled is correct? What if
the standard, or IETF in a BCP, says "treat this error as X" and a mailer used
in a different region made a different error?
Greg V.
-----Original Message-----
From: Jacob Palme [mailto:jpalme(_at_)dsv(_dot_)su(_dot_)se]
Sent: Thursday, September 02, 2004 6:18 AM
To: ietf-822(_at_)imc(_dot_)org
Cc: DSV KOM Development Group
Subject: Erroneous content-type: Text/plain; charset=us-ascii
I often get incorrect incoming e-mails, which have in their
headers either
Content-type: Text/plain; charset=us-ascii
or only
Content-type: Text/plain
even though the text is in effect ISO-8859-1.
I think IETF should maybe consider specifying what a mailer
should do if it gets characters > 127 in e-mail bodies
labelled "us-ascii" or with no charset label.
At present, the standards only say that mailers should
handle such messages as "us-ascii". But since us-ascii does
not say what you should do with characters > 127, present
standards actually do not say what a mailer should do with
such messages.
I think a mailer should in fact treat such messages as
either "iso-8859-1" or as a local character set specified
by the user in preferences.
In fact, many mailers already do this. So my suggestions is
mainly that we should standardize what most mailers already
do. There is a need to standardize this anyway, since there
are mailers, for example the one I am using, Eudora 6.1 for
Macintosh, which do not treat such messages as 8859-1.
Eudora 6.1 handles such messages as if their charset had
been Mac Roman, which is not very useful since almost no
actual incoming messages are in Mac Roman without being
labelled with this character set.
There is one argument for doing what Eudora 6.1 does.
This is the implicit rule "do not munge", which says that
if you get illegal content, you should not try to correct
it, but just pass it along unchanged.
--
Jacob Palme <jpalme(_at_)dsv(_dot_)su(_dot_)se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/