perl-unicode

Re: Encode.pm question

2003-05-16 10:30:17
Jarkko Hietaniemi wrote:

because it is there, UTF-7, though hardly ever used, is a UTF. jhi has recently looked around if there is any more interesting encodings that Encode should support but he told me "nothing interesting". I think I


"Interesting" was probably a bad choice of words... what I did is that

(Disclaimer: since I don't know that much about non-European character
sets, I might do some disservice here to e.g. Arabic or Indic users.)
Yes, Encode may need to support x-ISCII-xx (where xx is DE, BE, TA, and so forth). MS IE6 supports them all with their codepages (58xxx?). Yudit also supports them. In Yudit, it's called IS-XX. All of these are pretty straightforward to implement because Unicode Indic blocks are based on ISCII 1988. ISCII 1991(?) is a little different from ISCII 1988, but most of mappings are one to one. ISCII 1991(?) is available somewhere on the net in PDF. It'd be also nice to support TSCII(for Tamil.) This encoding is, well, not so nice to work with. Precisely because this encoding is rather limited and not so pretty, Encode may support this so that significant amount of text accumulated in this encoding can be converted to UTF-8 or otehr Unicode encoding forms as soon as possible.

glibc 2.3.x supports it. thanks to Bruno Haible. I based my Unicode->TSCII converter for Mozilla on (http://bugzilla.mozilla.org/show_bug.cgi?id=204039) his implementation with some modification. See http://www.tamil.net/tscii/faq5.html (there are a few mistakes in the table that I corrected in my PDF reproduction at http://jshin.net/i18n/tscii.pdf).. Bruno's implementation can be looked up at google with'Bruno Haible iconv tscii glibc'

I can't do it at the moment, but later I might be able to do it.

Jungshik