Re: Accept-Language: proposal

Larry Masinter wrote:

Please no.


Do you mean "no, please define accept-charset as well" or "no, please
call this header something other than accept-charset"?  I strongly
disagree with the former, I could be convinced of the latter.

In HTTP, accept-language and accept-charset are orthogonal,
and the reasons for making them so in HTTP apply equally to mail.


I disagree that the reasons for accept-charset in HTTP apply equally to
mail.

1) In mail, the lack of interactivity significantly raises the
importance of using a canonical format over the importance of having the
sender avoid the cost of converting from local form to canonical form.

2) Time has passed on since accept-charset was designed.  The report of
the IAB Character Set Workshop strongly recommends transitioning to
ISO-10646 based charsets, such as UTF-8 and/or UTF-7.

Valdis(_dot_)Kletnieks(_at_)vt(_dot_)edu wrote:

This AIX 4.2 box I'm typing on has *some* UFT-8 support.  However,
my MUA does not support it, nor do I have fonts to cover the entire UTF-8
space.


Your MUA also doesn't support the accept-language header.  In order for
it to support the accept-language header, it would have to be extended
to understand the UTF-8 charset.

It is not necessary that your MUA be able to display the entire UTF-8
codepoint space.  It is only necessary for your MUA to be able to
display that subset of the space that is necessary to display text in
the languages that are advertised.  If text in your languages all fit
into iso-8859-1, for example, your MUA only has to be able to convert
from UTF-8 to iso-8859-1 when displaying text.

Also, remember that UTF-8 is only an *encoding* scheme.  It is *not* an
internationalization or localization scheme.  As such, things like currency
formats, date/time preferences, and sorting/collating issues are totally
not addressed.  If the person requests Cyrillic, what's the format of the
date?


These issues are addressed by the language tags in the accept-language
header.  The person does not request Cyrillic, they request a specific
language that happens to use Cyrillic characters.

Why are you implying UTF-8 support? What does this buy you?


Simplicity.  It removes the need for senders and recipients to negotiate
and convert amongst a large number of different charsets.  It removes
the possibility of having a successful negotiation of a common language
be stymied by a failure to negotiate a common charset capable of
representing the necessary characters.

How about an alternative of "Return English if you can't supply in one of
the requested languages"?  This would eliminate a UTF-8 requirement, and
be more backward-compatible as well..


This alternative would not remove the UTF-8 requirement.  Once a sender
has determined which language to use, it has the problem of figuring out
which charset is both capable of expressing text in that language and
which will be understood by the recipient.  Without either a separate
charset negotiation or charset support being communicated by the
accept-language header itself, only us-ascii is known to be supported by
the recipient.