perl-unicode

Re: 5.8 roadmap and Encode

2002-03-02 04:13:58
Autrijus Tang <autrijus(_at_)autrijus(_dot_)org> writes:
  - 'gb18030', used in glibc2.2, is a superset of gbk, which is a super
    set of gb2312; we should use that instead of 'gbk' if we want gbk
    support.

This and euc-tw use 1, 2 or 4-byte encoding. Any points on how to use
that functionality for Encode.pm?

The .ucm format can cope:

<code_set_name> "whatever"
<mb_cur_min> 1
<mb_cur_max> 4
<subchar> \x3F
#
CHARMAP
<U0000> \x00 |0 # <control>
<U0001> \x01 |0 # <control>
<U0002> \x02 |0 # <control>
<U0003> \x03 |0 # <control>
.....
<U2222> \x04\x05 |0 # two byte
.......
<U4444> \x06\x07\x08\x09 |0 # fourbyte
....
END CHARMAP

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

<Prev in Thread] Current Thread [Next in Thread>