perl-unicode

Re: Encoding vs Charset

2002-03-27 10:00:59
On Wed, 27 Mar 2002, Dan Kogai wrote:

On Wednesday, March 27, 2002, at 11:22 , Jungshik Shin wrote:
  IMHO, you're also misusing the term 'charset' here. MIME charset
can be used synonymously with 'encodings' (or
character set encoding scheme: see CJKV Information Processing,
IETF RFC 2130 and RFC 2278). What has to be distinguished
is 'coded character set' on the one hand (JIS X 0208, JIS X 0212,
KS X 1001, KS X 1003, GB 2312, CNS 11xxx, ISO 10646, ISO 646, US-ASCII,
ISO-8859-x) and 'encoding/character
set encoding scheme/MIME charset on the other hand (EUC-JP,
EUC-KR, EUC-TW, EUC-CN, ISO-2022-JP, ISO-2022-KR, ISO-2022-CN,
ISO-8859-x, UTF-8, UTF-32, UTF-7, UTF-16, Big5, UHC)

   I do not thinks so.   This time I can confidently say it is IANA that 
has goofed.  To make my point clear, let me define Charset and Encoding 
once again.

Character Set:

   a collection of characters in which each character is distinguished 
with unique ID (in most cases, ID is number).

Character Encoding:

   A way to represent characters in byte stream.  Given character 
encoding may contain a single character set (i.e. US-ascii) or multiple 
character sets (i.e. EUC-JP that contain US-ascii, JIS X 0201 Kana, JIS 
X 0208 and JIS X 0212).  Given character encoding may also encode 
character set as-is (raw; US-ascii) or processed (for EUC-JP, US-ascii 
is as-is, JIS X 0201 is prepended with \x8E, JIS X 0208 is added by 
0x8080, and JIS X 0212 is added by 0x8080 then prepended with \x8F).

  You got me wrong. I don't have any objection to 'coded character set'
and 'encoding' defined this way. Problem is that  you're using '(coded)
character set' and 'charset' interchangeably.  They're two different
things depending on where you come from. My point is that because
'charset' is already overloaded with two or more different meanings(as
MIME Content-Type header parameter, it means 'encoding' as you defined
above), you'd better not use it when comparing coded character set on the
one hand and encoding/ character set encoding scheme on the other hand.
Simply, it'd be much better for you to say '(coded) character set vs
encoding' instead of 'charset vs encodig'

  Jungshik Shin

P.S. I'm wondering Why you posted this to Unicode list (where it's not
very much relevant) without posting to perl-unicode?  I was force to
post my response to Unicode list, but I'd rather keep this thread (if
there's need to continue) where it began (perl-unicode).

<Prev in Thread] Current Thread [Next in Thread>