So, we should have:
charset=iso-10646-g-*
charset=iso-10646-t-*
charset=iso-10646-j-*
charset=iso-10646-k-*
where "*" is replaced with Devanagari variations. As DIS 10646-1.2 cites
ISCII as its source of Devanagari characters, Devanagari distiction
should be done according to Indean standard. Are there any Indean
standard which lists names of Indean languages in Latin alphabets?
According to "Writing Systems of the World", by Akira Nakanishi,
India's constitution recognizes 15 official languages:
Language Script
1. Assamese Bengali
2. Bengali Bengali
3. Gujarati Gujarati
4. Kannada Kannada or Kanarese
5. Kashmiri Urdu or Arabic
6. Malayalam Malayalam
7. Marathi Devanagari
8. Oriya Oriya
9. Punjabi Gurmukhi
10. Sanskrit Devanagari
11. Tamil Tamil
12. Telugu Telugu
13. Urdu Urdu or Arabic
14. Hindi Devanagari
15. English Latin
The Unicode standard lists many other languages that are written using
the Devanagari script: Nepali, Awadhi, Bagheli, Bhatneri, Bhili,
Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Gondi (Betul,
Chhindwara, Mandla dialects), Harauti, Ho, Jaipuri, Kachchhi, Kanauji,
Konkani, Kului, Kumaoni, Kurku, Kurukh, Marwari, Mundari, Newari,
Palpa, and Santali.
But perhaps you want to ignore all these other languages since you're
also ignoring the difference between Mandarin and Cantonese (since
they use the same glyphs)?
So if you want to take India's 15 official languages and sort them by
script, you get:
Language Script
Assamese Bengali
Bengali Bengali
Hindi Devanagari
Marathi Devanagari
Sanskrit Devanagari
Gujarati Gujarati
Punjabi Gurmukhi
Kannada Kannada or Kanarese
English Latin
Malayalam Malayalam
Oriya Oriya
Tamil Tamil
Telugu Telugu
Kashmiri Urdu or Arabic
Urdu Urdu or Arabic
As you can see, Devanagari is not the only script that is used to
write multiple languages. (See Bengali and "Urdu or Arabic".) Are
the languages that use these scripts also written using different
glyphs? If so, your approach would seem to generalize to:
charset=iso-10646-<han>-<devanagari>-<bengali>-<urdu/arabic>
Expanding these, we would get:
charset=iso-10646-g-hindi-assamese-kashmiri
charset=iso-10646-g-hindi-assamese-urdu
charset=iso-10646-g-hindi-bengali-kashmiri
charset=iso-10646-g-hindi-bengali-urdu
charset=iso-10646-g-marathi-assamese-kashmiri
charset=iso-10646-g-marathi-assamese-urdu
charset=iso-10646-g-marathi-bengali-kashmiri
charset=iso-10646-g-marathi-bengali-urdu
charset=iso-10646-g-sanskrit-assamese-kashmiri
charset=iso-10646-g-sanskrit-assamese-urdu
charset=iso-10646-g-sanskrit-bengali-kashmiri
charset=iso-10646-g-sanskrit-bengali-urdu
charset=iso-10646-t-hindi-assamese-kashmiri
charset=iso-10646-t-hindi-assamese-urdu
charset=iso-10646-t-hindi-bengali-kashmiri
charset=iso-10646-t-hindi-bengali-urdu
charset=iso-10646-t-marathi-assamese-kashmiri
charset=iso-10646-t-marathi-assamese-urdu
charset=iso-10646-t-marathi-bengali-kashmiri
charset=iso-10646-t-marathi-bengali-urdu
charset=iso-10646-t-sanskrit-assamese-kashmiri
charset=iso-10646-t-sanskrit-assamese-urdu
charset=iso-10646-t-sanskrit-bengali-kashmiri
charset=iso-10646-t-sanskrit-bengali-urdu
charset=iso-10646-j-hindi-assamese-kashmiri
charset=iso-10646-j-hindi-assamese-urdu
charset=iso-10646-j-hindi-bengali-kashmiri
charset=iso-10646-j-hindi-bengali-urdu
charset=iso-10646-j-marathi-assamese-kashmiri
charset=iso-10646-j-marathi-assamese-urdu
charset=iso-10646-j-marathi-bengali-kashmiri
charset=iso-10646-j-marathi-bengali-urdu
charset=iso-10646-j-sanskrit-assamese-kashmiri
charset=iso-10646-j-sanskrit-assamese-urdu
charset=iso-10646-j-sanskrit-bengali-kashmiri
charset=iso-10646-j-sanskrit-bengali-urdu
charset=iso-10646-k-hindi-assamese-kashmiri
charset=iso-10646-k-hindi-assamese-urdu
charset=iso-10646-k-hindi-bengali-kashmiri
charset=iso-10646-k-hindi-bengali-urdu
charset=iso-10646-k-marathi-assamese-kashmiri
charset=iso-10646-k-marathi-assamese-urdu
charset=iso-10646-k-marathi-bengali-kashmiri
charset=iso-10646-k-marathi-bengali-urdu
charset=iso-10646-k-sanskrit-assamese-kashmiri
charset=iso-10646-k-sanskrit-assamese-urdu
charset=iso-10646-k-sanskrit-bengali-kashmiri
charset=iso-10646-k-sanskrit-bengali-urdu
Note that this does not even take into account any of the other
scripts and languages used around the world. Or are you saying that
the others don't have important glyph differences? If so, how would
you know that they are not important? Have you asked the people in
those countries for their opinion?
Or perhaps you're saying that people wouldn't normally mix so many
different languages in one MIME body part?
Or perhaps you're saying that we should at least solve the problem for
g, t, j and k, (and maybe Devanagari) and then worry about the other
glyphs later on when there is demand for such distinctions? (You once
said that you don't want to "overgeneralize".)
Could you elaborate on what you're envisioning? Please also tell us
what happens when people want to include more stuff in the future.
Erik