ietf-822
[Top] [All Lists]

Internet Draft -- Korean Character Encoding for Internet Messages

1993-05-11 01:30:09
Please distribute this document as an Internet Draft.
Comments should be sent to ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu
Thanks in advance.

Uhhyung Choi
Korea Network Information Center
----------------------------[ cut here ]---------------------------


Network Working Group                                        Kilnam Chon
Internet Draft                                              Hyun Je Park
                                                            Uhhyung Choi
                                                            May 11, 1993


            Korean Character Encoding for Internet Messages


Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress." 

   Please check the 1id-abstracts.txt listing contained in the
   internet-drafts Shadow Directories on nic.ddn.mil, nnsc.nsf.net, 
   nic.nordu.net, ftp.nisc.sri.com, or munnari.oz.au to learn the
   current status of any Internet Draft.

   This draft document will be submitted to the RFC editor as an
   informational document.  This document will expire before 2nd March
   1993.  Distribution of this memo is unlimited. Comments are
   solicited and should be sent to 
ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu(_dot_)


Introduction

   This document describes the encoding method being used to represent
   the Hangul, Korean character, in both header and body part of the
   internet electronic mail system. This encoding method was specified
   in System Development Network (SDN) in 1991, and has since then been
   used, it has widely spread from SDN to other Korean IP networks.

   This document describes the name and encoding method of Hangul that
   are to be used in order to match the message body format of MIME
   [MIME] and the RFC1342 [RFC1342] header format.

   This document describes only the encoding method for plain text. 
   Other text subtypes, rich text and similar forms of text, are beyond 
   the scope of this document.



Chon et al              Expires November 11, 1993               [Page 1]

Internet Draft                                              May 11, 1993


Description

   It is assumed that the starting code of the message is ASCII. ASCII
   and Hangul can be distinguished by use of the shift function. For
   example, the code SO will alert us that the up coming bytes will be 
   either a Hangul character in 2 bytes or an ASCII space character in 
   a single byte. To return to ASCII the SI code is used.

   Therefore, the escape sequence, shift function and character set used
   in a Hangul message are as follows:

           SO KSC 5601
           SI ASCII
           ESC $ ) C    Appears in the first line of the message

   The KSC 5601 [KSC5601] character set that includes Hangul, Chinese
   ideographic characters, graphic and foreign characters, etc. is two
   bytes long for each character.

   For more information about Korean character codes please refer to the
   KSC 5601-1989 document. Also, for more detailed information about the
   escape sequence and the shift function you can look for the ISO 2022
   [ISO2022] document.


Formal Syntax

   Where this document in its formal syntax does not agree with the
   description part, priority should be given to the formal syntax of
   the document.

   The notations used in this section of the document are according to
   those used in RFC822 [RFC822] with the same meaning.

        * (asterisk) has the following meaning :
             l*m "anything"

   The above means that "anything" has to be used at least l times and
   at most m times. Default values for l and m are 0 and infinitive,
   respectively.

   body            = *e-line *1( designator *( e-line / h-line ))

   designator      = ESC "$" ")" "C"

   e-line          = *text CRLF

   h-line          = *text 1*( segment *text ) CRLF



Chon et al              Expires November 11, 1993               [Page 2]

Internet Draft                                              May 11, 1993


   segment         = SO one-of-94 one-of-94
                         *( *SP 1*(one-of-94 one-of-94)) SI

                                               ; ( Octal, Decimal.)

   ESC             = <ISO 2022 ESC, escape>    ; ( 33, 27.)

   SO              = <ASCII SO, shift out>     ; ( 16, 14.)

   SI              = <ASCII SI, shift in>      ; ( 17, 15.)

   SP              = <ASCII SP, space>         ; ( 40, 32.)

   one-of-94       = <any char in 94-char set> ; (41-176, 33.-126.)

   CHAR            = <any ASCII character>     ; ( 0-177, 0.-127.)

   text            = <any CHAR, including bare
                      CR & bare LF, but NOT
                      including CRLF>


MIME and RFC1342 Considerations

   The name to be used for the Hangul encoding scheme in the contents is
   "ISO-2022-KR". This name when used in MIME message form would be:

                Content-Type: text/plain; charset=iso-2022-kr

   Since the Hangul encoding is done with 7 bit format in nature, the
   Content-Transfer-Encoding-header does not need to be used. However,
   while using the Hangul encoding, current Hangul message softwares
   does not support Base64 or Quoted-Printable encoding applied on 
   already encoded Hangul messages.

   The Hangul encoded in the header part of the message is 8-bit EUC.
   To use Hangul in the header part, according to the method proposed in
   RFC1342, the encoded Hangul are "B" or "Q" encoded. When doing so,
   the name to be used will be EUC-KR [EUC-KR].


Background Information

   The Hangul encoding system is based on the ISO 2022 [ISO2022]
   environment according to its 4/4 announcement. However, the Hangul
   encoding does not include the announcement's escape sequence.





Chon et al              Expires November 11, 1993               [Page 3]

Internet Draft                                              May 11, 1993


   The KSC 5601 used in this document is, in definition, identical to
   the KSC 5601-1987, KSC 5601-1989 and KSC 5601-1992's 94x94 octet
   definition.  Therefore, any revision that refers to KSC-5601 after
   1992 is to be considered as having the same meaning.

   At present, the Hangul encoding system is based on the experience
   acquired from the former widely used "N-Byte Hangul" among UNIX
   users. Actually, the encoding method, "N-Byte Hangul", using SO and
   SI was the encoding method used in SDN before KSC 5601 was made a
   national standard.

   This code is intended to be used for the information interchange of
   Hangul messages; any other use of the code is not considered apt.


References

   [ASCII] American National Standards Institute, "Coded character set
   -- 7-bit American national standard code for information
   interchange", ANSI X3.4-1968

   [ISO2022] International Organization for Standardization (ISO),
   "Information processing -- ISO 7-bit and 8-bit coded character sets
   -- Code extension techniques", International Standard, 1986,
   Ref. No. ISO 2022-1986 (E).

   [KSC5601] Korea Industrial Standards Association, "Code for
   Information Interchange (Hangul and Hanja)," Korean Industrial
   Standard, 1987, Ref. No. KS C 5601-1989.

   [EUC-KR] Korea Industrial Standards Association, "Hangul Unix
   Environment," Korean Industrial Standard, 1992, Ref. No.
   KS C 5861-1992.

   [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
   Text Messages", Internet standard, August 1982, RFC822.

   [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
   Internet Mail Extensions): Mechanisms for Specifying and Describing
   the Format of Internet Message Bodies", Proposed Internet standard,
   June 1992, RFC1341.

   [RFC1342] K. Moore, "Representation of Non-ASCII Text in Internet
   Message Headers", Proposed Internet standard, June 1992, RFC1342.


Security Considerations

   This document does not include security considerations.


Chon et al              Expires November 11, 1993               [Page 4]

Internet Draft                                              May 11, 1993


Acknowledgments

   The authors wants to thank all the people who assisted in drafting
   this document. In particular, we thank Erik von der Poel, Felix M. 
   Villarreal, Ienup Sung, Kyoung Namgoong, and Kyuho Kim.


Authors' Addresses

   Kilnam Chon
   Korea Advanced Institute of Science and Technology
   Department of Computer Science
   Taejon, 305-701, Republic of Korea
   
   Tel: +82-42-869-3514
   Fax: +82-42-869-3510

   Email: chon(_at_)cosmos(_dot_)kaist(_dot_)ac(_dot_)kr


   Hyun Je Park
   Solvit Chosun Media, Inc.
   748-16 Yeoksam-Dong, Kangnam-Gu
   Seoul, 135-080, Republic of Korea

   Tel: +82-2-561-0361
   Fax: +82-2-569-4847

   Email: hjpark(_at_)dino(_dot_)media(_dot_)co(_dot_)kr


   Uhhyung Choi
   Korea Advanced Institute of Science and Technology
   Department of Computer Science
   Taejon, 305-701, Republic of Korea

   Tel: +82-42-869-3554
   Fax: +82-42-869-3510

   Email: uhhyung(_at_)kaist(_dot_)ac(_dot_)kr











Chon et al              Expires November 11, 1993               [Page 5]

<Prev in Thread] Current Thread [Next in Thread>