I don't think it makes sense to use the accept-language field, as it's
currently defined, with email. partially this is because there's no clear
indication as to whose preferences are being described, and partially because
a reply to a message (the most likely use of accept-language) might go to the
author, the reply-to field, some subset of to and cc recipients, etc..
more generally, I don't think it makes sense to try to add descriptive
information about an address in a header field by using other fields
that don't explicitly reference that address. (note that these addresses
are sometimes changed in transit while leaving the other fields intact)
Let me start by pointing out that the accept-language is currently in fairly
widespread use in email. A number of very popular clients generate these
headers and a fair number of automatic response agents honor them. Strong
customer demand led us to support them in the autoresponses our product
generates a few years ago, which means I have a fair amount of experience with
them and the problems they do and do not have.
As you might expect, I have observed a number of operational problems with
this header:
(1) Far and away the most common problem has been the absence of a single,
recommended field being defined for this purpose. As is so often the
case, this has led to a number of different fields being used, some
supported by some agents and others not. The fields I've observed
operationally are: X-Accept-language, Accept-Language and
Preferred-Language. I've observed X-Accept-Language to be the
most popular, but that's from a small and probably unrepresentative
sample.
(2) Lack of use of the field in all cases. Although several popular clients
generate the field, other popular clients do not. The result is that
sometimes you get nicely internationalized responses and other times
you don't. This violates the least astonishment principle, and leads
to frenzied attempts to guess the appropriate language to use from other
data, e.g. the domain of some address ends in .jp, use Japanese. These
other mechanisms don't work nearly as well, of course, and have been
known to cause more problems than they solve.
(3) There are a lot of languages out there, and internationalization
is expensive. Implementations have to decide what languages they want
to support - there's no way to support them all. Additionally, some
products are still caught up in the old localization mindset, where
you have to take extra steps to install "language support". This all
leads to sporadic support for properly internationalized responses,
which again violates the least astonishment principle.
(4) Support for language subtags is hard to get right. Cases exist
where it is better to fall back to English than to use the wrong
dialect.
(5) Charsets. Figuring out the right charset to use can be a problem since
support for utf-8 is nowhere near universal and there can be disparate
user communities that use the same language but written with different
charsets.
Notable by its absence from my list is Keith's concern that the lack of a
binding of this information to a specific address or addresses in the header
will lead the wrong language being used in some cases. Simply put, I have never
encountered a case where this has been a problem. The reality seems to be that
language choice is a fairly coarse thing, and that if the originator of the
message expresses a preference in the header of the message, it seems to work
at least as well as using some sort of implementation default (usually
English). Alternately, since I'm dealing with automatically generated responses
here, I suppose you could say I'm mostly using a binding to the MAIL FROM
address and finding that it works pretty well.
Another potential issue I haven't found to be a problem in practice is the
syntax of the field itself. The HTTP syntax for the field is rather complex and
allows for each value to have an attached weight. Weights make sense when there
are other factors to consider when deciding what document to return, but
they're just unnecessary complexity for email and I doubt that most agents that
look at these fields handle them properly. But this has been a nonissue in
practice - the only fields with weights in them I've ever seen have been ones
I've generated myself.
In summary, I think this is a case where we've let the best be the enemy of the
good in a fairly major way. Is there a potential problem with multiplicity of
addresses in the header and with there not being a way to attach language
preference information to each address? Sure, but in practice the biggest
problem with having a single header for this information has been the lack of
single, standardized field, which we could have fixed easily had we been able
to get past the binding issue.
Ned