perl-unicode

Re: [Encode] euc-jp vs euc-jisx0213

2002-04-29 03:40:53

On Mon, 29 Apr 2002 15:45:09 +0900
Dan Kogai <dankogai(_at_)dan(_dot_)co(_dot_)jp> wrote:

Sadahiro-san and perl-unicode readers,

I am now working on Encode::JIS2K, an additional converter for JIS X 
0213:2000.  When I studied JIS X 0213, I found that for euc-jp, you can 
make a map so that it covers both JIS X 0212 and JIS X 0213.  I thought 
they were mutually exclusive but they were not (there are some 
duplicates, however.  So it was not as straightforward as aggregating 
two maps).

Excellent.

I'd like to give some additional explanations. As shown below,
JIS X 0213:2000 plane 2 and JIS X 0212:1997
don't overlap on their KU-TEN (row-cell).
(Rows marked with * mean they bear Kanji [CJK ideographs].)
(Noteworthy, <the Non-Kanji part> of JIS X 0212:1997 also has
 no overlap with JIS X 0208:1997.)

JIS X 0208:1997
                cells defined
  row 1:        1..94.
  row 2:        1..14, 26..33, 42..48, 60..74, 82..89, 94.
  row 3:        16..25, 33..58, 65..90.
  row 4:        1..83.
  row 5:        1..86.
  row 6:        1..24, 33..56.
  row 7:        1..33, 49..81.
  row 8:        1..32.
 *rows 16..46:  1..94.
 *row 47:       1..51.
 *rows 48..83:  1..94.
 *row 64:       1..6.

JIS X 0212:1997
                cells defined
  row 2:        15..25, 34..36, 75..81.
  row 6:        65..69, 71, 73..74, 76, 81..92.
  row 7:        34..46, 82..94.
  row 9:        1..2, 4, 6, 8..9, 11..13, 15..16, 33..48.
  row 10:       1..24, 26..87.
  row 11:       1..27, 29..35, 37..87.
 *rows 16..76:  1..94.
 *row 77:       1..67.

JIS X 0213:2000 plane 2
                cells defined
 *row 1:        1..94.
 *rows 3..5:    1..94.
 *row 8:        1..94.
 *rows 12..15:  1..94.
 *rows 78..93:  1..94.
 *row 94:       1..86.

I have just finished making new euc-jp.ucm that behaves like this;

for euc-jp,
* Round-Trips for all JIS X 0201-kana, JIS X 0208 and JIS X 0212 (same 
as before)
* Decode-only for those that appear only in JIS X 0213

I doubt whether users of 'euc-jp' will
assume it to be a combination with JIS X 0213.

Such a mixing would prevent warning/croaking
for appearance of code points that are not defined
originally (meaning w/o X 0213), wouldn't it?

EUC-JP is not defined as including JIS X 0213,
and EUC-JISX0213 is not specified it includes JIS X 0212.
(exactly speaking, JIS does exclude JIS X 0201 kana from EUC-JISX0213.)

As the article 6.3, the explanation (`kaisetsu') of JIS X 0213:2000
mentioned, overlapping of JIS X 0212 and JIS X 0213 plane 2
has been avoided by design,
since they both should be used in G3 in the EUC scheme,
so that it should help to tell EUC-JP from EUC-JISX0213
and vice versa; but it should not intend to make the G3 set
a mixed bag with X 0213 p2 with X 0212.

IMO, if you must need provide a mixture of JIS X 0213 with JIS X 0212,
it should be better to be under another name
than EUC-JP nor EUC-JISX0213.

Remind you that this new euc-jp.ucm is NOT THE SAME as euc-jp2k.ucm that 
is to be included in Encode::JIS2K;

for euc-jisx0213,
* Round-Trips for all JIS X 0201-kana and JIS X 0213 (both planes)
* Decode-only for those that appear only in JIS X 0212
* Those that conflict with JIS X 0208 and JIS X 0213-plane1, JIS X 0213 
definition is used.   Only these 3 are different (so JIS X 0213-plane1 
is ALMOST a superset of JIS X 0208).

euc-jp
<UFFE3> \xA1\xB1 |0 # FULLWIDTH MACRON
<U2015> \xA1\xBD |0 # HORIZONTAL BAR
<UFFE5> \xA1\xEF |0 # FULLWIDTH YEN SIGN

euc-jisx0213
<U203E> \xA1\xB1 |0 # OVERLINE
<U2014> \xA1\xBD |0 # EM DASH
<U00A5> \xA1\xEF |0 # YEN SIGN

In short, euc-jp and euc-jisx0213 differ only in encode() and decoders 
can decode both euc-jp(1990) and euc-jisx0213.

If no one objects,  I will use a new map for euc-jp in Encode-1.64 or 
later and Encode::JIS2K is to follow.

Dan the Encode Maintainer

Regards,
SADAHIRO Tomoyuki

<Prev in Thread] Current Thread [Next in Thread>