Re: Encode.pm question

Jarkko Hietaniemi wrote:

because it is there, UTF-7, though hardly ever used, is a UTF. jhihas recently looked around if there is any more interesting encodingsthat Encode should support but he told me "nothing interesting". Ithink I
"Interesting" was probably a bad choice of words... what I did is that

(Disclaimer: since I don't know that much about non-European character
sets, I might do some disservice here to e.g. Arabic or Indic users.)

Yes, Encode may need to support x-ISCII-xx (where xx is DE, BE, TA, andso forth). MS IE6supports them all with their codepages (58xxx?). Yudit also supportsthem. In Yudit, it'scalled IS-XX. All of these are pretty straightforward to implementbecause Unicode Indic blocksare based on ISCII 1988. ISCII 1991(?) is a little different from ISCII1988, but most of mappings are one to one. ISCII 1991(?) is availablesomewhere on the net in PDF.It'd be also nice to support TSCII(for Tamil.) This encoding is, well,not so nice to work with. Precisely because this encoding is ratherlimited and not so pretty, Encode may support this so that significantamount of text accumulated in this encoding can be converted to UTF-8 orotehr Unicode encoding forms as soon as possible.

glibc 2.3.x supports it. thanks to Bruno Haible. I based myUnicode->TSCII converter for Mozilla on(http://bugzilla.mozilla.org/show_bug.cgi?id=204039) his implementationwith some modification. See http://www.tamil.net/tscii/faq5.html (thereare a few mistakes in the table that I corrected in my PDF reproductionat http://jshin.net/i18n/tscii.pdf).. Bruno's implementation can belooked up at google with'Bruno Haible iconv tscii glibc'


I can't do it at the moment, but later I might be able to do it.

Jungshik