Re: Is Accept-language an email header field?

I don't think it makes sense to use the accept-language field, as it's
currently defined, with email.   partially this is because there's no clear
indication as to whose preferences are being described, and partially because
a reply to a message (the most likely use of accept-language) might go to the
author, the reply-to field, some subset of to and cc recipients, etc..

more generally, I don't think it makes sense to try to add descriptive
information about an address in a header field by using other fields
that don't explicitly reference that address.  (note that these addresses
are sometimes changed in transit while leaving the other fields intact)


Let me start by pointing out that the accept-language is currently in fairly
widespread use in email. A number of very popular clients generate these
headers and a fair number of automatic response agents honor them. Strong
customer demand led us to support them in the autoresponses our product
generates a few years ago, which means I have a fair amount of experience  with
them and the problems they do and do not have.

As you might expect, I have observed a number of operational problems with
this header:

(1) Far and away the most common problem has been the absence of a single,
    recommended field being defined for this purpose. As is so often the
    case, this has led to a number of different fields being used, some
    supported by some agents and others not. The fields I've observed
    operationally are: X-Accept-language, Accept-Language and
    Preferred-Language. I've observed X-Accept-Language to be the
    most popular, but that's from a small and probably unrepresentative
    sample.

(2) Lack of use of the field in all cases. Although several popular clients
    generate the field, other popular clients do not. The result is that
    sometimes you get nicely internationalized responses and other times
    you don't. This violates the least astonishment principle, and leads
    to frenzied attempts to guess the appropriate language to use from other
    data, e.g. the domain of some address ends in .jp, use Japanese. These
    other mechanisms don't work nearly as well, of course, and have been
    known to cause more problems than they solve.

(3) There are a lot of languages out there, and internationalization
    is expensive. Implementations have to decide what languages they want
    to support - there's no way to support them all. Additionally, some
    products are still caught up in the old localization mindset, where
    you have to take extra steps to install "language support". This all
    leads to sporadic support for properly internationalized responses,
    which again violates the least astonishment principle.

(4) Support for language subtags is hard to get right. Cases exist
    where it is better to fall back to English than to use the wrong
    dialect. 

(5) Charsets. Figuring out the right charset to use can be a problem since
    support for utf-8 is nowhere near universal and there can be disparate
    user communities that use the same language but written with different
    charsets.

Notable by its absence from my list is Keith's concern that the lack of a
binding of this information to a specific address or addresses in the header
will lead the wrong language being used in some cases. Simply put, I have never
encountered a case where this has been a problem. The reality seems to be that
language choice is a fairly coarse thing, and that if the originator of the
message expresses a preference in the header of the message, it seems to work
at least as well as using some sort of implementation default (usually
English). Alternately, since I'm dealing with automatically generated responses
here, I suppose you could say I'm mostly using a binding to the MAIL FROM
address and finding that it works pretty well.

Another potential issue I haven't found to be a problem in practice is the
syntax of the field itself. The HTTP syntax for the field is rather complex and
allows for each value to have an attached weight. Weights make sense when there
are other factors to consider when deciding what document to return, but
they're just unnecessary complexity for email and I doubt that most agents that
look at these fields handle them properly. But this has been a nonissue in
practice - the only fields with weights in them I've ever seen have been ones
I've generated myself.

In summary, I think this is a case where we've let the best be the enemy of the
good in a fairly major way. Is there a potential problem with multiplicity of
addresses in the header and with there not being a way to attach language
preference information to each address? Sure, but in practice the biggest
problem with having a single header for this information has been the lack of
single, standardized field, which we could have fixed easily had we been able
to get past the binding issue.

                                Ned