2. Use RFC 2482 (language tags embedded in UTF-8 text). Extremely
flexible, but would undoubtedly raise howls of protest from users whose
existing agents saw them as a sequence of garbage characters (people who
read news can get exceedingly irate when shown such things - as witness
the railings against HTML in news, or even against any form of Mime).
essentially nobody's existing UA supports UTF-8, so if you're
using UTF-8 anyway, including language tags in UTF-8 doesn't make
the situation much worse.
Well there are lots of existing UAs that support a well-known subset of
UTF-8 :-). Their users will indoubtedly complain when they see
gobbledegook appearing. But that is just a fact of life (or will be) :-( .
However, when _real_ UTF-8 agents start to appear, they will support only
a subset of the available characters sets. And in particular, many will
not recognise the RFC 2482 stuff. So you get more users complaining about
more gobbledegook.
I'm curious why this should be the case. If we define the spec for
utf-8-enhanced mail/news in such a way that support RFC 2482 is
required (at least in the sense that you must not display the
langugage tags as gobbledegook), why would large numbers of
"real" UAs fail to implement the spec?
I think the real point is that inclusion of language information is a MAY.
You do not include it because you have bought this nice new toy that
allows it, or because the latest Billyware does it by default. You use it
when there are useful clues which could indeed be helpful to displays which
know about that particular language. Which means you will hardly ever need
to use it for the language EN.
I disagree - you may very well need to use it for the language EN,
not so much for display, than for text-to-speech converters.
(why should they assume English pronounciation as a default?)
*I* think the real point is that you should not lie about the language.
If you don't have reliable knowledge of the language, you should leave
off the language tag.
Keith