perl-unicode

Re: Are GB 18030 and CNS 11643-1992 the best spellings?

2002-03-28 19:47:29
On Fri, 29 Mar 2002, Anton Tagunov wrote:

  Hi Anton,

Writing a bit of an article, putting in there all I have learnt
about CJK encodings on the Internet and at 
perl-unicode(_at_)perl(_dot_)org(_dot_)
Has already taken me a week :-)

 I strongly recommend you get CJKV Information Processing by Ken Lunde.
It has a lot of gory details. As Dan wrote in his dropped document on
encodings, ISO-2022 standard (ECMA 35 at http://www.ecma.ch) itself is
a great(?!) read :-)

Is GB 18030 the best spelling for this encoding?
Isn't it GB18030 or GB_18030 or GB_18030-2000?

  I guess the official designation is GB 18030-2000.


Is CNS 11643-1992 the best spelling?
Isn't it CNS11643, CNS11643-1992, CNS_11643-1992?

  I believe they're CNS 11643-1992 or CNS 11643-1986.


My suspicions arise from IANA registration names
without spaces like

   You have to be careful with IANA registration. In a sense,
it's like a sink that accepts everything thrown into it :-)

JIS_C6226-1983
KS_C_5601-198
GB2312
KSC5636

  KSC5636 is for ISO 646-KR (Korean version of ISO 646 or US-ASCII),
The official name is KS C 5636-1993 (KS X 1001:1992).
The official name for KS_C_5601-1987 is KS C 5601-1987,
which was revised in 1989, 1992 and reissued in 1997 as KS X 1001:1997,
which was in turn revised in 1998 (with two characters added, one of
which was EURO sign).

  The official designation of JIS_C6226-1983 should be JIS C 6226-1983,
(a revision of JIS C 6226-1978) which was renamed JIS X 0208:1983 and
then was revised and 'renamed' JIS X 0208:1997.

  You may noticed that JIS underwent changes in the designation
of their character set standards (from JIS C -> JIS X) in early 1980's,
which KS closely followed in 1997 (KS C -> KS X). Basically, JIS C and
KS C (perhaps for electrical/electronics related standards) 'ran out
of space' (well, they can use more digits.....)  and both JIS and KS
created a new section 'X' for IT-related standards. Moreover, the year
a standard is issued used to be preceded by '-', but now is preceded by
':' as in ISO standards (e.g. ISO 10646-1:2000, ISO 10646-2:2001)


but did IANA invite these cryptic  names above on its own?
Or did it take the names from existing standards?

  I guess they replaced space with '_' or got rid of
space to make them a single word/identifier.

If from existing, then a similar name for the encodings in question
already probably exist.. Does it?

  As I wrote, IANA didn't consider much when something is given to them.
They just add  to the list almost whatever is given to them.  If they
had not done that, we could have had EUC-CN in place of 'GB2312'.
Maybe, that's my biased impression/opinion, but some others expressed
more or less similar views in other forums. You should not give
too much weight to IANA registry. 

   Jungshik 


<Prev in Thread] Current Thread [Next in Thread>