perl-unicode

Re[2]: Are GB 18030 and CNS 11643-1992 the best spellings?

2002-03-30 02:26:14
Hello, Jungshik!

1) GB 18030, CNS 11643-1992
JS> I guess the official designation is  GB 18030-2000.
JS> I believe they're                    CNS 11643-1992 or CNS 11643-1986.
Ken's online has the later one CNS 11643-1992

(the reason I asked you, Junghik, was I was not sure if Ken's online
has the official names or some slang)

Wonder still if EUC-TW uses -1992 or -1986?

JS> You have to be careful with IANA registration. In a sense,
JS> it's like a sink that accepts everything thrown into it :-)
JS>
JS> ... IANA didn't consider much when something is given to them.
JS> They just add  to the list almost whatever is given to them.
JS> ...
JS> You should not give too much weight to IANA registry.

Okay, let us practice statistical approach. It should work
well on a junk-yard :-)

94

 KS C 5636-1993  -> KSC5636
 KS C 5636-1989  -> KSC5636

94x94
 
 JIS C 6226-1983 -> JIS_C6226-1983
 JIS X 0208:1983 -> JIS_X0208-1983

 JIS X 0208:1990
 JIS X 0208:1997
 
 KS C 5601-1987  -> KS_C_5601-1987
 
 KS C 5601-1989 (revised)
 KS C 5601-1992 (revised)
 KS X 1001:1997 (reissued)
 KS X 1001:1998 (two chars added including euro)

In the majority of cases the first ' ' becomes '_'.
  This is in line with RFC 2047's "encode-word" syntax, in the ?Q mode
  that allowes spaces to be =20 or _ :-)
The second space treatment is inconsistent, but fortunately
GB 18030-2000 and CNS 11643-1992 do not have it, so, by
interpolation, if IANA ever registers these an "educated" guess is

 GB 18030-2000  -> GB_18030-2000
 CNS 11643-1992 -> CNS_11643-1992

Thank you, Jungshik for helping me with this!
And I'm now putting the official standad's names into my
survey for JIS * and KS * series.

2)

JS> JIS C -> JIS X   early 1980's,
JS> KS C -> KS X     1997
BTW, Ken's online also has a JIS X 0201-1976 entry.. Wonder if
it should have really been JIX X 0201:1976

3)

JS> Moreover, the year a standard is issued used to be preceded
JS> by '-', but now is preceded by ':' as in ISO standards

Oh, my!!! Thanks a lot! Would have never caught that on my own!
Don't know if the up-to-date printed book by Ken has it, but
his old cjk.inf available on like uses only '-'-s everywhere!!

I guess ":" is not allowed in the charset parameter of a MIME
name, so all IANA registrations I guess are going to have a '-'
in that position.

BTW have seen it many times people are writing JIS X0208..



P.S.

AT> Writing a bit of an article...
JS>  I strongly recommend you get CJKV Information Processing by Ken Lunde.

Have read what all his on-line variant
http://www.oreilly.com/people/authors/lunde/cjk_inf.html
several years old, but very useful for putting on track!
My first reading on CJK :-)

JS> As Dan wrote in his dropped document on encodings, ISO-2022 standard
JS> (ECMA 35 at http://www.ecma.ch) itself is a great(?!) read :-)
Supported.pod still recommends that, I hope :-)


<Prev in Thread] Current Thread [Next in Thread>