perl-unicode

Re: Encode: CJK-Guide

2002-03-26 21:42:02
On Wed, 27 Mar 2002, Jarkko Hietaniemi wrote:

Mozilla and so forth. I'm not blaming any one here for the lack of
support for Johab and CP949. (that's the last thing I'd do). Anyway, 
I'll try to help you with Korean encodings and other CJK encodings if 
necessary. 

Excellent, thanks.  You may download the latest Perl developer snapshot
(which contains the latest Encode, 0.99) from:

      http://www.iki.fi/jhi/perl(_at_)15489(_dot_)tgz

and look at the documentation under perl/ext/Encode/

  I've looked around ext/Encode and I found that CP949 is supported.
So, what has to be added is JOHAB and what needs to be modified
is EUC-KR to support 8byte seq. representation of Hangul syllables
(see http://jshin.net/i18n/euckr2.html or 
http://bugzilla.mozilla.org/show_bug.cgi?id=128587)

  For Johab, no new table is necessary because Hangul precomposed
syllable mapping (to Unicode) is algorithmic while Hanjas and symbols can 
be mapped to KS X 1001 algorithmically and then mapped to Unicode
using KS X 1001 mapping table. 

 Before going further, I have a question or two. It appears that
euc-kr, ksc5601-raw(ksc5601-gl or whatever) and cp949 have their own
mapping tables although they're closely related. Is there any reason
for this? In case of Johab, the easiest way to add support for it is to
just generate the mapping table for it, but I feel uncomfotable bloating
the code when it can be done algorithmically if I can make use of the
mapping table for euc-kr or ksc5601(-raw). It appears that euc-jp and
shift_jis don't share the mapping table, either although shift_jis and
euc-jp can be more or less algorithmically converted to/from each other.
I must be missing something here. There should be a way to do it and
I'd be glad if someone could tell me where to look for an example case
(e.g. shift_jis and euc-jp)


  BTW, how about Big5-HKSCS(Hongkong), GBK, and GB18030(PRC)?

I *think* (but me speekee no Chineese) we do support those in Encode,
but for space considerations one has to install an additional module,
Encode::HanExtra.

  I found that Big5-HKSCS is included in 'plain Encode' and GBK, GB18030,
EUC-TW, and Big5plus are in HanExtra.

   Jungshik Shin

<Prev in Thread] Current Thread [Next in Thread>