perl-unicode

[Encode] Johab support, et al.

2002-03-27 02:55:31
On Wednesday, March 27, 2002, at 01:40 , Jungshik Shin wrote:
  I've looked around ext/Encode and I found that CP949 is supported.
So, what has to be added is JOHAB and what needs to be modified
is EUC-KR to support 8byte seq. representation of Hangul syllables
(see http://jshin.net/i18n/euckr2.html or
http://bugzilla.mozilla.org/show_bug.cgi?id=128587)

  Show me the direction and better yet, send me a patch, please.

 Before going further, I have a question or two. It appears that
euc-kr, ksc5601-raw(ksc5601-gl or whatever) and cp949 have their own
mapping tables although they're closely related. Is there any reason
for this? In case of Johab, the easiest way to add support for it is to
just generate the mapping table for it, but I feel uncomfotable bloating
the code when it can be done algorithmically if I can make use of the
mapping table for euc-kr or ksc5601(-raw). It appears that euc-jp and
shift_jis don't share the mapping table, either although shift_jis and
euc-jp can be more or less algorithmically converted to/from each other.
I must be missing something here. There should be a way to do it and
I'd be glad if someone could tell me where to look for an example case
(e.g. shift_jis and euc-jp)

  The current (and rough) convention is;

* Let the table lookup (available via compile script by NI-S, invoked by Makefile.PL in subdirectories) handle as much as possible. * Save space for shared code points via compile. For instance, Shift_JIS, cp932, and MacJapanese are crunched to sjis_t. I have saved 1MB on FreeBSD that way * Escape-based encodings are handled by perl Those include ISO-2022-* (my work, with codes stolen from Jcode; I stole from my own pocket this time) and Hz (by Autrijus and perhaps Sadahiro-san. Sorry, I didn't review the code thoroughly).

It is algorithmically possible to convert EUC and Shift JIS (That's how Jcode handles these, with EUC "blessed" encoding) but table lookup is faster, even faster than Jcode because it is now in XS. Anyway, I am so darn glad to find you here. Kamsahamnida! (Is this one correct?)

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>