On Saturday, March 30, 2002, at 04:44 , Jarkko Hietaniemi wrote:
Gentlemen, you may want to read Unicode 3.2
( http://www.unicode.org/unicode/reports/tr28/ ) It does say something
about Han, Katakana, and Hangul (sections 10.1, 10.3, and 10.4). (No,
I don't know what happened to 10.2). What I'm after is whether the
said CJK changes affect Encode?
For Japanese, I pretty much doubt it, at least for the time being. JIS
X 0213:2000, as you see, is only two years old and encodings that
support are not popular -- yet.
The support will take a form of ADDITION, not MODIFICATION, at least
so long as JIS X 0213 is concerned.
But let me post a summery of (proposed) encodings for JIS X 0213 for
the record.
(See also http://www.asahi-net.or.jp/~wq6k-yn/code/enc-x0213.html if
your browser supports Japanese)
JIS X 0213
==========
Is; tidy (JIS X 0208 + JIS X0212). It consists of two 94x94 planes.
plane 1 corresponds to 0208 and 0212. But some of the code points are
rearranged so 0213-1 != 0208 and 0213-2 != 0208
EUC-JISX0213
============
Encoding scheme is the same as EUC-JP. Here is the diagram
G0 US-ASCC
G1 JISX0213-1
(G2 JISX0201-kana (depreciated))
G3 JISX0213-2
Technical difficulty is minimum. All I need is a table. I may make a
UCM out of Unihan DB and post it to something like Encode::JPExtra or
something.
When in use, this encoding supersedes EUC-JP because you can't tell the
difference by looking at a given string. You must explicitly set your
encoding to this or "classical" EUC-JP
ISO-2022-JP-3
=============
Basically This one is ISO-2022-JP with new escape sequences.
Esc. Seq. Charset
------------------------
ESC $(O JISX0213-1
ESC $(P JISX0213-2
This one is easy, too.
Unlike EUC-JISX0213, this one EXTENDS ISO-2022-JP and old 0208/0212 and
0213 can coexist, thanks to escape sequences.
Shift_JISX0213
==============
And the most controversial one. This one squeezes what was not used in
Shift_JIS. Shift_JIS was already acrobatic and this one is a
nightmare. However, this one also has only 2 bytes max so the support
for this is not that hard. But unlike the cases above, I need UTF-8 =>
Shift_JISX0213 mapping instead of vanilla JISX0213, which I am not sure
if it is available. I'll look into it.
As for Hangul. I'll let the experts like Jungshik review the impact....
Dan the Man with Even more Encodings