Danny Iacovou >
Content-language: American
Content-language-dialect: Northern
Content-language-Geographical-Region: East Coast/PA/Pittsburg
Given all this, isn't this enough:
Content-language: American/Northern/East Coast/PA/Pittsburg
Dana, I do agree with you that all of this would be
really coool to include into MIME; however, I think
we need to know what an appropriate breakdown would
be - and to define the headers accordingly. The more I
think about this the more cautious I become.
I agree with your caution, and given the lack of maturity for
audible rendering I think any spec we formulate should do no
more than suggest form, as your examples would do. BTW, my 2
roomates were actively researching speach recognition techniques
10 years ago, I am not completly ignorant of the difficultys
of getting computers to deal with natural language.
Actual content, (ie PA vs Pennsylvania) should be defered to
more concrete need.
After feeling overwhelmed by the several thousand volumes and
numerous Journals of the linguistics section of our McKeldin
library, I suspect we will have a difficult time pinning down
a definitive list of the worlds spoken languages and writing
forms, but I have not yet exhausted the resources available
to me here, so more on that later. What is clear to me at
this point is that any attempt to broaden that enumeration
to include all the worlds dialects is probably a hopeless
cause left to far braver souls than we (perhaps that would
be a good topic for a PhD in library science or history of
linguistics or some such, it would require the energy of a
youthful and detail concious person with deep pockets to
purchase all the government and ISO publications involved).
Luckily we can approach this enumeration another way: rather
than attempt to document exhaustivly all that might be
_desired_ of us, we could narrow that to allow registration
of extant implemented systems, and those _are_ both well defined
and enumerable, if ideosyncratic and platform specific.
Apple and other mfrs have implemented support for specific
writing systems and selected speech sythesizers, so there are
some tags available (admittedly platform-specific). A survey
of these would seem pertinant, and may be possible within our
immediate company.
I would love to be able to send mail that when played
back makes my words sound like a New Yorker
Inability to render in a specific dialect is not normally
distorting, admittedly actors, screen play authors et al
will want to be able to throw around dialect tagged speach,
it isnt normally important, so a system which defers its
implementation will have considerable virtue in spite of
being "imperfect". Later refinements are clearly possible
via obvious mechanisms.
BTW: you do realize that as soon as a person writes a
program that given an input file and a desired
language can spit out a file written in the
universal linguistics language
[no such beast, sigh, even IPA has dialects]
and vice-versa the whole notion of Content-language*
becomes obsolete.
I disagree, even if IPA lived up to its name.
You postulate a politically unlikely universal adoption of that
new language as well as 100% conversion of all the worlds writen
documents (electronic documents, clay tablets, architectural
monuments, billboards, books, magazines, machine control-panel
labeling...) which I suspect will never happen, if only
because the worlds historians will want to remain employed.
5 years ago I would have predicted (and bemoaned) a world wide
gradual adoption of english as a second language, not because
I want that, but simply because english is the defacto language
of computing and the world is moving to embrace computer usage.
Now that Apple and others (hopefully us as well) are divorcing
computers from a dependancy on english that trend will be ameliorated.
--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>