Re: A spec for showing language in MIME headers

Here, we have, quite pedantically, been talking about the future
possibility of having a 639 reigstered language "IANA", which destroys
Harald's scheme.


ISO would be very silly if they did this.


Unless there actually is a langage with a name "IANA" (quite unlikely).

Maybe we can alleviate the
hassle by using "x-iana-*" rather than "iana-*", then everything that isn't
handled by ISO will be prefixed by "x-".  How about this Harald?


Or "y-" or just "-*" or ...

If each compoents of a multipart message have separate Date: header, yes.


Depends on the archive.  Is it storing messages or body parts?  If messages,
then the main Date: header is sufficient.


If all the parts share the same date.

This may be hacky, but these are borderline cases, and we shouldn't be
designing MIME solely for the borderline. Show me someone who wants to
communicate daily in 16th century English or Japanese in e-mail, and you
may convince me.  16th century attachments to the main message don't count.


Time stamp is necessary if quite unstable contry names are used in the tag.

Otherwise, if we don't have to consider languages in the year 2050 (not past
but the future), we don't need the time stamp.

Isn't MIME rich text marked-up document?


Yes, but text/enriched is not intended to handle everything in the universe.
Other formats (e.g. SGML) provide richer mark-up environments.  It is possible
to use Harald's language tags in text/enriched, but I would expect them to
be for disambiguating CJK, etc for display purposes, rather than saying
"the version of English spoken between 1500 and 1700 A.D.".


As the CJK disambiguation is necessary word-by-word (don't forget that
Harald proposes to handle multi-lingual document) and in header part,
and the disambiguation is necessary only for the specific character
set: ISO10646/UNICODE, language tag is not a good mechanism for the
disambiguation. It's better to use ISO10646/UNICODE with the
charset names "iso-10646-<language tag>" for single language only.

This policy is compatible with one of a directionality handling method
discussed in the Houston MIME CONTENT BOF. That is, if we need
directionality disambiguation for some encoding method, and if we need
language disambiguation for another encoding method, and if we need
some other disambiguation for yet another encoding method, it is
better to encode it in the charset name than providing another
subtypes.

At the moment, the only use I've seen for country codes is to select a
speech synthesis unit for a particular dialect.  Personally, hearing a
message spoken in old English just because some joker put "en-1500" in
the header doesn't impress me, and what should it do if it doesn't
have a suitable dialect available?  I'm also one of those Aussies who is
likely to put an axe through my computer if it starts talking to me in
"en-us" ( :-) ), so country codes should definitely be only if the dialect
is very important IMHO.


My question, then, is

        1) How will the revised 639 be? Are there anything like en-au
        separately registered in it?

        2) If not, isn't it better to register "en-au" not as a contry
        code but just as an IANA registered language name? IANA registration
        can survive even after the contry name is deleted from ISO 3166.

If date information is important, then lots of other info is likely to be
important also.


That is, I think it better to register "en-au" with a concrete
IANA-registered-with-informational-RFC definition such as "English
commonly used in Australia in 1993" than ambigously saying "English
commonly used in Australia sometime".

                                                Masataka Ohta