Re: A spec for showing language in MIME headers

From: rhys(_at_)cs(_dot_)uq(_dot_)oz(_dot_)au
To: ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu
Subject: Re: A spec for showing language in MIME headers

...

Masataka seems worried that the "en" of today may bear very little

resemblance

to the "en" of 200 years from now, and so old messages in archives may be
interpreted as the future "en" rather than the "old" one.  He also seems
worried that something like "ru-su" becoming "ru-ru" overnight may cause

the

net to disintegrate.  I'm unconvinced at the moment that this is a problem
that the IETF needs to solve.  Usage of tags, just as usage of languages,
will evolve over time, and it is not up to the IETF to say how they can evolve.



Speaking as someone who has studied a fair amount of Linguistics (and, BTW,
is probably a member of the same medieval recreation group which Dana
mentioned ..  I even took a class on medieval japanese naming practices last
summer ;-))

Languages do evolve.  Mainly in idioms but also pronunciations &c. 
Therefore the a system for tagging such things should most definitely have a
place for placing markings as to the "version" of the language.  Calling it "
old" versus "middle" probably is not sufficient, depending on what level of
detail you want to support.  For instance, a "speech processor" unit (one of
the suggested uses for Content-Language) would love to know that it was
meant to be spoken with a scottish highlands accent lest it choose an
american midwest accent instead because that's the default.

For the purposes which Dana mentioned (mixed text from different languages)
it is, to my knowledge, enough to mark the desired character sets.  The
common thing to do is discuss some issues about some thing (name, phrase,
sentence or paragraph) the discussion being in the vernacular language in
use (for me, modern english).  Then embedded within the text is some phrases,
sentences, or paragraphs of the thing being discussed.  This thing needs to
be represented on the screen using the right glyphs.

To my knowledge choosing the right glyphs is driven by the character set.

So what we need is a sufficient quantity of character sets so we can discuss
old high germanic names in one paragraph, old english in the next, and
russian after that.  Where does the need for marking the languages come from?

I _would_ support making the country code into something that should be

used

only if it is absolutely necessary to disambiguate different usages of the
same language.  e.g. French and French-Canadian which have different
capitalisation rules I believe.  Hence, Russian would be "ru" and English
would be "en", no matter what country, unless it is absolutely necessary
that the recipient hear it with a Ukrainian ("ru-ua") or an Australian
("en-au") accent respectively.


Hmmm...  I don't see this.

Isn't capitalization done within the text?  `a' is a different character
code than `A' after all...

Accent is a different matter.  Even within countries there are wildly
different accents.  Northern french versus southern.  Or north east american
versus southern american versus ghetto black american versus bronx american
versus brooklyn american versus new jersey american (I can go on).  Like I
said above, how much detail is desired?  Is it worthwhile generating this
much detail?  Is it truly important that these details be preserved?


<- David Herron <david(_at_)twg(_dot_)com> (work) 
<david(_at_)davids(_dot_)mmdf(_dot_)com> (home)
<-
<- There are only two pains- 
<-     The pain of discipline or the pain of regret.