perl-unicode

Re: 5.8 roadmap and Encode

2002-03-01 04:06:47
On Fri, Mar 01, 2002 at 10:48:00AM +0000, Nick Ing-Simmons wrote:
Not directly used in a 'Encode/Decode' context, as far as I know; to do so
will be setting a precedent, since CNS and other doublebyte charsets often
has such a 'raw' form that's not used in encoded transports.

Perl's Encode is not just for transport.

I'm aware of that; but gb2312 is really not an encoding at all; it's a charmap
that can be presented by one of three encodings. Most people use it as a 
shorthand of saying euc-cn, unless they're in Microsoft, where it means the
GBK encoding.

http://java.sun.com/j2se/1.4/docs/guide/intl/encoding.doc.html does entirely
without the 'GB2312' encoding, but sports euc-cn and iso2022cn(_(gb|cns))?,
which is arguably a better policy than having gb2312-x11 or other names
around. (another policy, like the iconv versions, is to alias gb2312 as
euc_cn.)

In the extremely unlikely situation that somebody wants the raw GB, a simple
regex performed against the HZ encoding will suffice, imvho.

Thanks,
/Autrijus/

Attachment: pgpMFZvyif04k.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>