[Encode] Encoding vs. Charset

Encode hackers (Especially Autrijius)

I am now fairly content with the feature set of Encode so I decided towrite some programs based upon it.And I have found that most of Chinese (Continental; seems likeTaiwanese are much more technically correct) and Korean mails and webpages confuse "charset" and "encodings". That is, charset="gb2312"really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr.Sadly this misconception is enbedded to popular browsers.

  So when you try something like

  my ($encname) = /^Content-Type:.*charset=[\"\']?([0-9A-Za-z_-]+)/o;
  ....
  my $utf8 = encode($encname, $string);

You are in big trouble. Aliases is no salvation because most webpages in *.cn happily includes


  <META http-equiv="Content-Type" content="text/html; charset=gb2312">

It seems to them it is taken for granted that encoding is simply acharset encoded in EUC. Anton has wistfully states this inEncode::Supported but I didn't realize the depth of problem until I putEncode from in vitro to in vivo (that is, out of lab and into realworld).

  So I propose to;

* rename gb2312 to gb2312-raw, ksc5601 to ksc5601-raw
* and alias gb2312 and ksc5601 to euc-(cn|kr)

I know it's technically wrong but perl opts more for practical thantechnical....


Dan the Man with Too Many SPAMs form CN and KR

<Prev in Thread]	Current Thread	[Next in Thread>
[Encode] Encoding vs. Charset, Dan Kogai <= Re: [Encode] Encoding vs. Charset, Jarkko Hietaniemi Re: [Encode] Encoding vs. Charset, Autrijus Tang Re: [Encode] Encoding vs. Charset, Jarkko Hietaniemi Re: [Encode] Encoding vs. Charset, Autrijus Tang Re: [Encode] Encoding vs. Charset, Autrijus Tang Re: [Encode] Encoding vs. Charset, Anton Tagunov