Re: New content-language draft

1) The stuff about iana-art, iana-hum and iana-ukn, I don't like.
   The reason is that iana-hum and iana-art look like prefixes, and you
   want to tack stuff onto the end. iana-c and iana-simula are more likely
   candidates.


I have absolutely nothing against iana-c and iana-simula. Or
iana-pearl, iana-web, iana-sgml, iana-chess, ...

One complication is that a much finer division of programming
languages into "dialects" might be useful than for human
langauges. You may want to specify not only language but
version of the langauge standard, compiler, operating system,
and version level of the compiler.

One way to cater for arbitrary precision in the declaration of
what artificial language is used in a body-part is to provide a
parameter "variant" to the Content-Language: field.

For such a parameter to be unambigous, the Content-Langauge:
field should contain only one language code. Or we could specify
that the parameter always refers to the last langauge code of
the field. If more than one langauge with a "variant" parameter
shall be indicated for a body part, more than one
Content-Langauge: header can be given for that part. So we
should state the rule that multiple Content-Language: headers
are additive.

Another complication has to do with mixing human and artificial
languages. I think two cases should be distinguished:

A) The inclusion of programs or fragments of source code in an
   ordinary text (e.g. when a teacher tries to explain the
   intricacies of the file buffer variable in classical Pascal
   in connection with interactive input of data).

B) The inclusion of small pieces of human language in an
   artificial language, typically in comments or string
   literals.

(It's difficult to decide if a WEB or other literate program
belongs to A or B, but the distinction is clear in the great
majority of cases.)

I would prefer for case A headers like:

Content-Type: text/plain (or application/postscript or something)
Content-Language: en, iana-pascal

For case B a new subtype of Text could be registered:

Content-Type: text/source-code
Content-Language: iana-pascal, en

By the way, I don't think that the present draft says anything
about the significance of the order of languages when more than
one is specified in the Content-Langauge: header. What about
adding a Note that if one langauge is clearly more important in
the body part than another, its language code should precede the
code of the other langauge?

   If you have something that you haven't registered with
   IANA, use X-language (x-loglan).

   iana-ukn MAY be a good idea, but I would prefer iana-unknown, or
   state explicitly that the header should be missing or empty on this case.


My proposed tags iana-art, iana-hum, and iana-unk are not
codes for a single language but for a group of langauges.
This proposal was inspired by (the rejected) ISO CD 639-2 for
three-letter langauge codes.  Not only did that proposed
standard  define _collective_ language codes such as

   ine   Indo-European (Other)   (a linguistically defined language group)

   art   Artificial (Other)      (a genetically defined language group)

   afa   Afro-Asiatic (Other)    (a geographically defined langauge group)

It also included the _special_ langauge codes

   mul   Multiple languages, when it is not practical to
         specify all the appropriate langauge codes.

   und   Undetermined language, for those situations in which a
         language must be indicates but the language cannot be
         identified

The practical value of such codes for groups of languages are
however debatable and I don't insist on their inclusion.

2) The stuff about registering enough information is good. I will add
   a registration form as an appendix, requesting the information you
   want.
   Main tags can only be registered through ISO, so registration forms
   for these should not be required here.


You may want to register both a langauge and variants of that
language. Both must be encoded by the subtag. As an example,
registrations will probably be made for both the Sami (Lappish)
language, say "iana-smi", and the Sami dialects
-  South Sami, "iana-smis"
-  Ume Sami, "iana-smiu"
-  Lule Sami, "iana-smilu"
-  North Sami, "iana-smin"
-  East Sami, "iana-smie"
-  Enare Sami, "iana-smien"
-  Skolt Sami, "iana-smisk"

The boundary between the langauge part of the subtag and the
variant part will be implicit.

There can be several reasons for distinguishing between variants
of a langauge:

-  different dialects (e.g. Sami)

-  different national conventions (e.g. French in different
   countries)

-  different written forms (e.g. Nynorsk and Bokmaal in
   Norwegian)

-  different scripts (e.g. Azerbaijani written by the Arabic
   script, the Cyrillic script or the Latin script)

-  different transliteration systems (e.g. for writing Chinese
   in the Latin script)

-  different orthographies (e.g. the old and the modern
   orthographies for Greenlandic)

-  different periods in the historical development of a language
   (e.g. Old, Middle and Modern English).

IANA registration form for subtags for language codes

1) Main tag it is to be registered with:

2) Subtag requested:

3) Original name of language, expressed in ASCII:


This should be mandatory.

4) English name of language:


This should be given if known.

5) French name of language, expressed in ASCII:


This should be given if known. For French, "expressed in
ISO 8859-1" is more appropriate.

Here I would like to add:

  +) When registering a variant of a language:

     a) Original name or description of the variant, expressed in ASCII

     b) English name or description of the variant (if known)

     c) French name or description of the variant (if known),
        expressed in ISO-8859-1

     d) Distinguishing quality (e.g. dialect, script,
        orthography, geographical area, historical period)

6) Literature reference defining or descrbing the language:


Add: "or variant"

7) Name and E-mail address of applicant:


Maybe this should be added:

  8) Organization(s) supporting the application

This can be e.g. a research library, a society of linguists, a
national ministry of education, a local language council.