perl-unicode

[Encode] euc-jp vs euc-jisx0213

2002-04-28 23:45:17
Sadahiro-san and perl-unicode readers,

I am now working on Encode::JIS2K, an additional converter for JIS X 0213:2000. When I studied JIS X 0213, I found that for euc-jp, you can make a map so that it covers both JIS X 0212 and JIS X 0213. I thought they were mutually exclusive but they were not (there are some duplicates, however. So it was not as straightforward as aggregating two maps).

I have just finished making new euc-jp.ucm that behaves like this;

for euc-jp,
* Round-Trips for all JIS X 0201-kana, JIS X 0208 and JIS X 0212 (same as before)
* Decode-only for those that appear only in JIS X 0213

Remind you that this new euc-jp.ucm is NOT THE SAME as euc-jp2k.ucm that is to be included in Encode::JIS2K;

for euc-jisx0213,
* Round-Trips for all JIS X 0201-kana and JIS X 0213 (both planes)
* Decode-only for those that appear only in JIS X 0212
* Those that conflict with JIS X 0208 and JIS X 0213-plane1, JIS X 0213 definition is used. Only these 3 are different (so JIS X 0213-plane1 is ALMOST a superset of JIS X 0208).

euc-jp
<UFFE3> \xA1\xB1 |0 # FULLWIDTH MACRON
<U2015> \xA1\xBD |0 # HORIZONTAL BAR
<UFFE5> \xA1\xEF |0 # FULLWIDTH YEN SIGN

euc-jisx0213
<U203E> \xA1\xB1 |0 # OVERLINE
<U2014> \xA1\xBD |0 # EM DASH
<U00A5> \xA1\xEF |0 # YEN SIGN

In short, euc-jp and euc-jisx0213 differ only in encode() and decoders can decode both euc-jp(1990) and euc-jisx0213.

If no one objects, I will use a new map for euc-jp in Encode-1.64 or later and Encode::JIS2K is to follow.

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>