Jarkko Hietaniemi wrote:
because it is there, UTF-7, though hardly ever used, is a UTF. jhi
has recently looked around if there is any more interesting encodings
that Encode should support but he told me "nothing interesting". I
think I
"Interesting" was probably a bad choice of words... what I did is that
(Disclaimer: since I don't know that much about non-European character
sets, I might do some disservice here to e.g. Arabic or Indic users.)
Yes, Encode may need to support x-ISCII-xx (where xx is DE, BE, TA, and
so forth). MS IE6
supports them all with their codepages (58xxx?). Yudit also supports
them. In Yudit, it's
called IS-XX. All of these are pretty straightforward to implement
because Unicode Indic blocks
are based on ISCII 1988. ISCII 1991(?) is a little different from ISCII
1988, but most of mappings are one to one. ISCII 1991(?) is available
somewhere on the net in PDF.
It'd be also nice to support TSCII(for Tamil.) This encoding is, well,
not so nice to work with. Precisely because this encoding is rather
limited and not so pretty, Encode may support this so that significant
amount of text accumulated in this encoding can be converted to UTF-8 or
otehr Unicode encoding forms as soon as possible.
glibc 2.3.x supports it. thanks to Bruno Haible. I based my
Unicode->TSCII converter for Mozilla on
(http://bugzilla.mozilla.org/show_bug.cgi?id=204039) his implementation
with some modification. See http://www.tamil.net/tscii/faq5.html (there
are a few mistakes in the table that I corrected in my PDF reproduction
at http://jshin.net/i18n/tscii.pdf).. Bruno's implementation can be
looked up at google with'Bruno Haible iconv tscii glibc'
I can't do it at the moment, but later I might be able to do it.
Jungshik