perl-unicode

Re: 5.8 roadmap and Encode

2002-03-01 07:34:17
Autrijus Tang <autrijus(_at_)www(_dot_)autrijus(_dot_)org> writes:

I'm aware of that; but gb2312 is really not an encoding at all; it's a charmap

This all gets very wooly - but is not a charmap exactly what is used
as the index of a font?:

nick(_at_)bactrian 1009$ xlsfonts | grep gb2312
-cc-song-medium-r-normal-jiantizi-40-400-75-75-c-400-gb2312.1980-0
-cc-song-medium-r-normal-jiantizi-48-480-75-75-c-480-gb2312.1980-0
-guobiao-song-medium-r-normal--16-160-72-72-c-160-gb2312.80&gb8565.88-0
-isas-fangsong ti-medium-r-normal--16-160-72-72-c-160-gb2312.1980-0
-isas-song ti-medium-r-normal--16-160-72-72-c-160-gb2312.1980-0
-isas-song ti-medium-r-normal--24-240-72-72-c-240-gb2312.1980-0
nick(_at_)bactrian 1010$

The reason Encode got put into perl-5.7.2 in the first place
was to do for perl/Tk what Tcl's encoding support does for Tcl/Tk.
Now we obviously want it to do a lot more than that - including and
probably more commonly - transport encodings.


that can be presented by one of three encodings. Most people use it as a
shorthand of saying euc-cn, unless they're in Microsoft, where it means the
GBK encoding.

http://java.sun.com/j2se/1.4/docs/guide/intl/encoding.doc.html does entirely
without the 'GB2312' encoding, but sports euc-cn and iso2022cn(_(gb|cns))?,
which is arguably a better policy than having gb2312-x11 or other names
around. (another policy, like the iconv versions, is to alias gb2312 as
euc_cn.)

In the extremely unlikely situation that somebody wants the raw GB, a simple
regex performed against the HZ encoding will suffice, imvho.

Thanks,
/Autrijus/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6d-cvs (FreeBSD)

iEYEARECAAYFAjx/YJoACgkQtLPdNzw1AaCjbgCgs3zOpRIc13tTzJSrINQq6nEA
39sAoJo2XsYR0uIw9o3DdMFdapXqhTLt
=3oxN
-----END PGP SIGNATURE-----
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



<Prev in Thread] Current Thread [Next in Thread>