Re: A spec for showing language in MIME headers

Let me restate this, but I'm assuming that we are in substantial
agreement:

  -- For the indentification of major languages that are likely to be
used in "plain text" email, 639 (or 636bis) probably suffices.


I don't think that the existing ISO 639 is sufficient. It lacks
codes for not unimportant minority languages like Sami (Lappish),
 spoken in the Nordic countries, and Romany, spoken by gypsies.
The number of languages in ISO 639 is 136. The draft ISO CD
639-2 (three-letter codes) included 401 languages (or langauge
groups).

Unfortunately the new part 2 of ISO 639 still has at least a
couple of years before adoption as an international standard.
The recently rejected ISO CD 639-2 was still deficient for some
langauges. There was a code "nno" for the Nynorsk form of
Norwegian, but no code for the other major form, Bokmal. There
were no separate codes for the Sami dialects South Sami, Lule
Sami, North Sami, Enare Sami, and Skolt Sami, although some of
these differ more in vocabulary, orthography and grammar than
the separate Scandinavian languages Danish (da), Norwegian (no),
and Swedish (sv) do.

  -- For major languages that are used in email but that don't appear in
639, we may be better off encouraging registration in 639, rather than
inventing an Internet-specific (e.g., IANA) mechanism.


ISO 639 does have a registration mechanism. I still think IANA
registration would be useful, though:

-  The list of currently ISO-registered codes is not available
   on the Internet.

-  Implementers that want to utilize langauge codes won't find
   the name and address of the ISO registration authority
   easily. (Do you know it or where to find it?)

-  The present ISO 639 uses a too limited name space -- 2ALPHA
   -- for registration to be indiscriminate.

-  We don't know anything about the readiness of the ISO
   registration agency to accept any new language code, perhaps
   fairly specialized, which may be needed for Internet use. (I
   doubt that it would like to register two-letter codes for
   five Sami dialects.)

-  When new langauge codes are registered with ISO they can
   easily be added to an IANA registry.

-  I don't expect language code registrations to be so frequent
   that they will imply a considerable work load on IANA.

  -- For very complex cases, requiring the specific identification of
dialect, location, time, etc., nothing with the granularity of "body
part" is likely to suffice, and we need coding techniques that can
identify the origins/context of particular words.


It's correct that the simple Content-Langauge: proposal can't
satisfy _all_ langauge information encoding needs. But TEI (the
Text Encoding Initiative) is working on solutions for special
needs like these.

/Olle