ietf-822
[Top] [All Lists]

new version of ISO-2022-JP document

1992-12-01 19:01:17
IETF-822 people,

Here is the newest version of the ISO-2022-JP document.  I have
included "+" and "-" signs to indicate additions and deletions
respectively, similar to GNU diff's "-u" option.

If there are no comments, I will have this draft converted to an RFC
and register "ISO-2022-JP" with IANA.


Thanks,
Erik


 Network Working Group                                          Jun Murai
 Internet Draft                                              Mark Crispin
                                                        Erik van der Poel
                                                        1st December 1992


-        Japanese Character Encoding for Internet Messages
+        Japanese Character Encoding for Internet Message Bodies


 Status of this Memo

    This document is an Internet Draft.  Internet Drafts are working
    documents of the Internet Engineering Task Force (IETF), its Areas,
    and its Working Groups. Note that other groups may also distribute
    working documents as Internet Drafts.

    Internet Drafts are draft documents valid for a maximum of six
    months. Internet Drafts may be updated, replaced, or obsoleted by
    other documents at any time.  It is not appropriate to use Internet
    Drafts as reference material or to cite them other than as a "working
    draft" or "work in progress."

    Please check the I-D abstract listing contained in each Internet
    Draft directory to learn the current status of this or any other
    Internet Draft.

    This draft document will be submitted to the RFC editor as an
    informational document.  This document will expire before 1st June
    1993.  Distribution of this memo is unlimited.  Please send comments
    to ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu(_dot_)


 Introduction

    This document describes the encoding used in electronic mail [RFC822]
    and network news [RFC1036] message bodies in several Japanese
    networks. It was first specified by and used in JUNET [JUNET]. The
    encoding is now also widely used in Japanese IP communities.

    This document names the encoding "ISO-2022-JP", which is intended to
    be used in the "charset" parameter field of MIME [MIME]
-   messages and RFC 1342 [RFC1342] headers.
+   messages. The use of ISO-2022-JP in RFC 1342 [RFC1342] headers
+   is expected to be the subject of a separate document.

    This document only describes the encoding of plain text. The encoding
    of other subtypes of text, such as richtext, is not discussed here.





 Murai et al              Expires 1st June 1993                  [Page 1]

 Internet Draft                                 Updated 1st December 1992


 Description

    The message body starts in ASCII [ASCII], and switches to Japanese
    characters through an escape sequence. For example, the escape
    sequence ESC $ B (three bytes, hexadecimal values: 1B 24 42)
    indicates that the bytes following this escape sequence are Japanese
    characters, which are encoded in two bytes each.  To switch back to
    ASCII, the escape sequence ESC ( B is used.

    The following table gives the escape sequences and the character sets
    used in ISO-2022-JP messages.
+   The ISOREG number is the registration number in ISO's registry [ISOREG].

-       ESC ( B    ASCII
-       ESC ( J    JIS X 0201-1976 ("Roman" set)
-       ESC $ @    JIS X 0208-1978
-       ESC $ B    JIS X 0208-1983
+       Esc Seq    Character Set                  ISOREG
+
+       ESC ( B    ASCII                             6
+       ESC ( J    JIS X 0201-1976 ("Roman" set)    14
+       ESC $ @    JIS X 0208-1978                  42
+       ESC $ B    JIS X 0208-1983                  87

+   Note that JIS X 0208-1983 was called JIS C 6226-1983 until the name
+   was changed in March 1987. Likewise, JIS C 6220 was renamed JIS X
+   0201.
+ 
    The "Roman" character set of JIS X 0201 [JISX0201] is identical to
    ASCII except for backslash (\) and tilde (~). The backslash is
    replaced by the Yen sign, and the tilde is replaced by macron
    (overline). This set is Japan's national variant of ISO 646 [ISO646].

    The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
    Katakana and some other symbols and characters. Each character takes
    up two bytes.

    For further details about the JIS Japanese national character set
    standards, refer to [JISX0201] and [JISX0208].  For further
    information about the escape sequences, see [ISO2022] and [ISOREG].

    If there are JIS X 0208 characters on a line, there must be a switch
    to ASCII or to the "Roman" set of JIS X 0201 before the end of the
    line (i.e. before the CRLF). This means that the next line starts in
    the character set that was switched to before the end of the previous
    line.

+   Also, the message body must end with CRLF, and there must be a switch
+   to ASCII before the last CRLF (if there are any non-ASCII characters
+   in the message body).
+ 
    Other restrictions are given in the Formal Syntax below.



 Murai et al              Expires 1st June 1993                  [Page 2]

 Internet Draft                                 Updated 1st December 1992


 Formal Syntax

    The notational conventions used here are identical to those used in
    RFC 822 [RFC822].

    The * (asterisk) convention is as follows:

            l*m something

    meaning at least l and at most m somethings, with l and m taking
    default values of 0 and infinity, respectively.


    line                = *text *1( *segment single-byte-seq *text ) CRLF

    segment             = single-byte-segment / double-byte-segment

    single-byte-segment = single-byte-seq 1*text

    double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

    single-byte-seq     = ESC "(" ( "B" / "J" )

    double-byte-seq     = ESC "$" ( "@" / "B" )

                                                     ; ( Octal, Decimal.)

    ESC                 = <ISO 2022 ESC, escape>     ; (    33,      27.)

    SI                  = <ISO 2022 SI, shift-in>    ; (    17,      15.)

    SO                  = <ISO 2022 SO, shift-out>   ; (    16,      14.)

    one-of-94           = <any char in 94-char set>  ; (41-176, 33.-126.)

    CHAR                = <any ASCII character>      ; ( 0-177,  0.-127.)

    text                = <any CHAR, including bare CR & bare LF, but NOT
                           including CRLF, and not including ESC, SI, SO>


-MIME and RFC 1342 Considerations
+MIME Considerations

    The name given to the JUNET character encoding is "ISO-2022-JP". This
    name is intended to be used in MIME messages as follows:

            Content-Type: text/plain; charset=iso-2022-jp




 Murai et al              Expires 1st June 1993                  [Page 3]

 Internet Draft                                 Updated 1st December 1992


    The ISO-2022-JP encoding is already in 7-bit form, so it is not
    necessary to use a Content-Transfer-Encoding header. It should be
    noted that applying the Base64 or Quoted-Printable encoding will
    render the message unreadable in current JUNET software.
-
-   The name ISO-2022-JP may also be used in RFC 1342 headers, though in
-   this case, the text should be encoded using either the "B" or "Q"
-   encoding, to avoid getting damaged by header-processing software. As
-   ISO-2022-JP text often contains many bytes that have a special
-   meaning in headers, it is probably easier to use the "B" encoding,
-   rather than trying to determine which particular byte values need "Q"
-   encoding.


 Background Information

    The JUNET encoding was described in the JUNET User's Guide [JUNET]
    (JUNET Riyou No Tebiki Dai Ippan).

    The encoding is based on the particular usage of ISO 2022 announced
    by 4/1 (see [ISO2022] for details). However, the escape sequence
    normally used for this announcement is not included in ISO-2022-JP
    messages.

    The so-called half-width (hankaku) Katakana, that is, the Kana set of
    JIS X 0201, are not used in ISO-2022-JP messages.

    In the past, some systems erroneously used the escape sequence ESC (
    H in JUNET messages. This escape sequence is officially registered
    for a Swedish character set [ISOREG], and should not be used in ISO-
    2022-JP messages.

    Some systems do not distinguish between ESC ( B and ESC ( J or
    between ESC $ @ and ESC $ B for display. However, when relaying a
    message to another system, the escape sequences must not be altered
    in any way.

    The human user (not implementor) should try to keep lines within 80
    display columns, or, preferably, within 75 (or so) columns, to allow
    insertion of ">" at the beginning of each line in excerpts. Each JIS
    X 0208 character takes up two columns, and the escape sequences do
    not take up any columns. The implementor is reminded that JIS X 0208
    characters take up two bytes and should not be split in the middle to
    break lines for displaying, etc.

    The JIS X 0208 standard was revised in 1990, to add two characters at
    the end of the table. Although ISO 2022 specifies special additional
    escape sequences to indicate the use of revised character sets, it is
    suggested here not to make use of this special escape sequence in
    ISO-2022-JP text, even if the two characters added to JIS X 0208 in
    1990 are used.








 Murai et al              Expires 1st June 1993                  [Page 4]

 Internet Draft                                 Updated 1st December 1992


 References

+   [ASCII] American National Standards Institute, "Coded character set
+   -- 7-bit American national standard code for information
+   interchange", ANSI X3.4-1968
+ 
+   [ISO646] International Organization for Standardization (ISO),
+   "Information processing -- ISO 7-bit coded character set for
+   information interchange", International Standard, Ref. No. ISO 646-
+   1983 (E)
+ 
    [ISO2022] International Organization for Standardization (ISO),
    "Information processing -- ISO 7-bit and 8-bit coded character sets
    -- Code extension techniques", International Standard, Ref. No. ISO
    2022-1986 (E)

+   [ISOREG] International Organization for Standardization (ISO),
+   "International Register of Coded Character Sets To Be Used With
+   Escape Sequences"
+ 
+   [JISX0201] Japanese Standards Association, "Code for Information
+   Interchange", JIS X 0201-1976
+ 
+   [JISX0208] Japanese Standards Association, "Code of the Japanese
+   graphic character set for information interchange", JIS X 0208-1978,
+   -1983 and -1990
+ 
    [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
    Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
    User's Guide (First Edition)"), February 1988

    [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
    Internet Mail Extensions): Mechanisms for Specifying and Describing
    the Format of Internet Message Bodies", Proposed (Internet) standard,
    June 1992, rfc1341

    [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
    Text Messages", Internet standard, August 1982, rfc822

+   [RFC1036] M. Horton and R. Adams, "Standard for Interchange of USENET
+   Messages", December 1987, rfc1036
+ 
    [RFC1342] Keith Moore, "Representation of Non-ASCII Text in Internet
    Message Headers", Proposed (Internet) standard, June 1992, rfc1342


 Security Considerations




 Murai et al              Expires 1st June 1993                  [Page 5]

 Internet Draft                                 Updated 1st December 1992


    Security considerations are not discussed in this memo.


 Acknowledgements

    Many people assisted in drafting this document. The authors wish to
    thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
    Handa.


 Authors' Addresses


    Jun Murai
    Keio University
    5322 Endo, Fujisawa
    Kanagawa 252 Japan

    Fax: +81 (466) 49-1101

    EMail: jun(_at_)wide(_dot_)ad(_dot_)jp


    Mark Crispin
    Panda Programming
    6158 Lariat Loop NE
    Bainbridge Island, WA 98110-2098
    USA

    Phone: +1 (206) 842-2385

    EMail: MRC(_at_)PANDA(_dot_)COM


    Erik M. van der Poel
    A-105 Park Avenue
    4-4-10 Ohta, Kisarazu
    Chiba 292 Japan

    Phone: +81 (438) 22-5836
    Fax:   +81 (438) 22-5837

    EMail: erik(_at_)poel(_dot_)juice(_dot_)or(_dot_)jp








 Murai et al              Expires 1st June 1993                  [Page 6]


<Prev in Thread] Current Thread [Next in Thread>