perl-unicode

Re: Encode: CJK-Guide

2002-03-27 03:40:28
Jungshik Shin <jshin(_at_)mailaps(_dot_)org> writes:

  For Johab, no new table is necessary because Hangul precomposed
syllable mapping (to Unicode) is algorithmic while Hanjas and symbols can
be mapped to KS X 1001 algorithmically and then mapped to Unicode
using KS X 1001 mapping table.

Before going further, I have a question or two. It appears that
euc-kr, ksc5601-raw(ksc5601-gl or whatever) and cp949 have their own
mapping tables although they're closely related. Is there any reason
for this?

The "compile" process will share the compiled form of the tables automaticaly
if they are closely related.

In case of Johab, the easiest way to add support for it is to
just generate the mapping table for it, but I feel uncomfotable bloating
the code when it can be done algorithmically if I can make use of the
mapping table for euc-kr or ksc5601(-raw). It appears that euc-jp and
shift_jis don't share the mapping table, either although shift_jis and
euc-jp can be more or less algorithmically converted to/from each other.
I must be missing something here. There should be a way to do it and
I'd be glad if someone could tell me where to look for an example case
(e.g. shift_jis and euc-jp)

There is some documentation on the API that an encoding must provide.
(I think Dan moved it out of Encode.pm.)

Most of existing encodings use one multi-byte-to-multi-byte "engine",
with compiled tables - this works well for 8-bit encodings and can
handle the others - not necessarily optimally.

It would be good to have some algorithmic encodings to use as
examples. The only ones we have at present are UCS-2 (as perl code)
and UTF-8 (C but buried in perl's core).

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



<Prev in Thread] Current Thread [Next in Thread>