Re: A spec for showing language in MIME headers

Danny Iacovou >

 Content-language: American    
 Content-language-dialect: Northern
 Content-language-Geographical-Region: East Coast/PA/Pittsburg
  
Given all this, isn't this enough:
  
 Content-language: American/Northern/East Coast/PA/Pittsburg
  
     Dana, I do agree with you that all of this would be
  really coool to include into MIME; however, I think
  we need to know what an appropriate  breakdown would
  be - and to define the headers accordingly. The more I
  think about this the more cautious I become.


I agree with your caution, and given the lack of maturity for
audible rendering I think any spec we formulate should do no 
more than suggest form, as your examples would do.  BTW, my 2 
roomates were actively researching speach recognition techniques 
10 years ago, I am not completly ignorant of the difficultys 
of getting computers to deal with natural language.

Actual content, (ie PA vs Pennsylvania) should be defered to
more concrete need.

After feeling overwhelmed by the several thousand volumes and 
numerous Journals of the linguistics section of our McKeldin 
library, I suspect we will have a difficult time pinning down 
a definitive list of the worlds spoken languages and writing 
forms, but I have not yet exhausted the resources available 
to me here, so more on that later.  What is clear to me at 
this point is that any attempt to broaden that enumeration 
to include all the worlds dialects is probably a hopeless 
cause left to far braver souls than we (perhaps that would 
be a good topic for a PhD in library science or history of 
linguistics or some such, it would require the energy of a 
youthful and detail concious person with deep pockets to 
purchase all the government and ISO publications involved).

Luckily we can approach this enumeration another way: rather 
than attempt to document exhaustivly all that might be 
_desired_ of us, we could narrow that to allow registration 
of extant implemented systems, and those _are_ both well defined 
and enumerable, if ideosyncratic and platform specific.  

Apple and other mfrs have implemented support for specific 
writing systems and selected speech sythesizers, so there are 
some tags available (admittedly platform-specific).  A survey 
of these would seem pertinant, and may be possible within our 
immediate company.

     I would love to be able to send mail that when played
  back makes my words sound like a New Yorker


Inability to render in a specific dialect is not normally 
distorting, admittedly actors, screen play authors et al 
will want to be able to throw around dialect tagged speach, 
it isnt normally important, so a system which defers its 
implementation will have considerable virtue in spite of 
being "imperfect".  Later refinements are clearly possible
via obvious mechanisms.

BTW: you do realize that as soon as a person writes a
program that given an input file and a desired
language can spit out a file written in the
universal linguistics language


[no such beast, sigh, even IPA has dialects]

and vice-versa the whole notion of Content-language*
becomes obsolete.


I disagree, even if IPA lived up to its name.  

You postulate a politically unlikely universal adoption of that 
new language as well as 100% conversion of all the worlds writen
documents (electronic documents, clay tablets,  architectural 
monuments, billboards, books, magazines, machine control-panel 
labeling...)  which I suspect will never happen, if only 
because the worlds historians will want to remain employed.

5 years ago I would have predicted (and bemoaned) a world wide 
gradual adoption of english as a second language, not because 
I want that, but simply because english is the defacto language 
of computing and the world is moving to embrace computer usage.

Now that Apple and others (hopefully us as well) are divorcing 
computers from a dependancy on english that trend will be ameliorated.
--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>