perl-unicode

[Encode] HEADS-UP: ucm/cp932.ucm will be updated

2002-10-18 06:30:04
Porters (especially Nick Ing-XS),

I would like to release Encode 1.78 soon to address the problem in CP932 (MS version of Shift_JIS) which MORIYAMA Masayuki <msyk(_at_)mtg(_dot_)biglobe(_dot_)ne(_dot_)jp> has discovered. Not only has he addressed the problem he has also supplied me a patch. Though he was reluctant to come to perl(5-porters|unicode)@perl.org (I have invited him but I was too shy to talk to us in English), the problem and solution he has raised was too good to ignore so I would like to update Encode on his behalf. Here is the summery of his points.

* ucm/cp932.ucm was based on the mapping file at unicode.org [0] but that mapping is obsolete; it works on Windows 3.1 but not in the era of Win32. * as a result, cp932 is rendered almost useless, at least too impractical
* patch was made available [1]

My first suggestion was to "Ask MS to update the data at unicode.org and if you are unsatisfied w/ the one that comes w/ Encode you are free to CPANize your version". But he has raised even more points and I was finally convinced.

* Though not in unicode.org, MS has already made the mapping available in their web [2][3] * Python and Ruby will be using the MS version, not the one at unicode.org * Java has been known to suffer badly for confusing Shift_JIS and CP932 but Encode is already free of this problem by supplying different mappings for Shift_JIS and CP932.

[0] http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/ CP932.TXT
[1] http://www2d.biglobe.ne.jp/~msyk/perl/cp932.html
[2] http://www.microsoft.com/typography/unicode/cscp.htm
[3] http://www.microsoft.com/typography/unicode/932.txt

One small but significant concern is Tcl/Tk; So far Encode's CP932 does match that of Tcl but not after my next release of Encode. So I decided to call for opinion before I commit the release.

AFAIK, CP¥d+ should be avoided for any data exchanged in the Net so you should not use it on the web or mails so it's perfectly all right if Tk(Web|Mail) has a problem handling them. At the same time Win32 Perl users would be much happier if CP¥d+ are made more practical.

The URI [2] also has links to other code pages so I would also like to review them and if neccessary, update them. 8 bit code pages (CP12??) seem OK but other CJK (CP9??) needs reviews.

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>