2022-jp

John Klensin writes:

Assuming that copyright 
regulations, etc., permit, an informational document should be submitted
immediately to the RFC editor for publication that contains the "real" 
specification of what we describe as 2022-jp but which, if I understand 
things correctly, is actually a Japanese (JIS) National Standard.


Actually, it is a JUNET "standard", not a JIS standard, though the
former refers to the latter, of course.

Presumably that document is in Kanji


Yes, the JUNET document is in Kanji. I have a copy of the first
edition in front of me. The cover says (in romanized Japanese):

        JUNET Riyou No Tebiki
        (Dai Ippan)
        1988 Nen 2 Gatsu
        JUNET Riyou No Tebiki Sakusei Iinkai

English translation of the above:

        JUNET User's Guide
        (First Edition)
        February 1988
        JUNET User's Guide Drafting Committee

One of my colleagues says that the Second Edition seems to be in the
works, but I would guess that this isn't really relevant, since the
current usage of 2022 is based on what was printed in the first
edition anyway.

So, here is a translation of part of a section of that document. I
submit this purely "for your information". The original doesn't have
any copyright (that I can find), but I doubt that they would mind
anyway. (Famous last words.)

START HERE:

6.3.1  Agreement on the Use of Kanji in JUNET

Regulations (1) through (11) are given here. Some of these must be
observed, while some are recommendations that should be observed.

(1) The "Standard JUNET Kanji Code" is the code used in communicating mail
    and news on the network, and is not related to the code used locally
    on each system or between parties with other prior agreements.

    Messages that use the "Standard JUNET Kanji Code" are guaranteed, and
    should be guaranteed, to be communicated properly across the network.
    However, this document does not attempt to regulate messages sent
    internally within an organization. However, the rules given here
    should be followed for communication between organizations.

    In other words, it is possible to use a different code internally, but
    the internal code must be converted to the standard code before
    sending messages out onto the network, and, conversely, incoming
    standard code should then be converted to the local internal code as
    appropriate.

    Similarly, consenting parties may use a private code. This document
    does not attempt to regulate the use of a non-standard code between
    parties that have reached prior agreements and also know that their
    link will allow such codes. However, if the receiver is not known and
    correct communication of the private code has not been checked, the
    standard code must be used.

(2) The code used for communication is not only a domestic matter, so an
    ISO-conformant code is used. (Strictly speaking, a subset of the
    character sets known as graphic character sets is used. For more
    details, see the section at the end of this chapter.)

    The following are used:
        JIS X0201 (C6220) 7-bit codes
        JIS X0208 (C6226) Kanji codes
        JIS X0202 (C6228) code extension techniques

(3) Announcers are omitted entirely. Messages are handled as though the
    announcers were omitted.

    Currently, it is allowed to use either of the following Kanji escape
    sequence pairs:
        ESC $ @, ESC ( J
        ESC $ B, ESC ( B

    However, the following is not used:
        ESC ( H
    This should be changed to a different sequence.

        Note:

        In order to determine the escape sequences you are using, create a
        small Kanji file and look at it with a tool such as od (octal dump).

        For example, say you created a file called temp that contains the 5
        Hiragana characters "aiueo".

        Unix: od -c temp

        0000000  033  $  @  $  "  $  $  $  &  $  (  $  *  033  (  J
                 ---------                                ---------

        If you obtain the above, your system uses JIS for the Kanji code, and
        the escape sequences ESC-$-@ and ESC-(-J are used.

        Unix: od -c temp

        0000000  202  240  202  242  202  244  202  246  202  250  \n  \n

        If you obtain the above, your system is using Shift-JIS. You may not
        send this as is to the outside world. Please talk to your system
        administrator.

    All of the 4 escape sequences must be communicated intact.

    For example, you are free to treat ESC $ B as though it were ESC $ @
    locally, but you may not change any ESC $ B sequences to ESC $ @ when
    relaying mail or news.

    (However, the escape sequence does not have to be preserved between
    consenting hosts. Also, it is noted here that there has been the
    opinion that changing the escape sequence is not a pressing problem.)

(4) At the end of a file, ASCII / JIS Roman is selected and then a newline
    is added. In other words, at the end of a message one must select
    English (i.e. quit Kanji), and one must add a newline (to finish the
    line).

END HERE.

Sorry about the abrupt end (there's more but no more time today).

By the way, the JIS X0202 is a Japanese translation of (part of?) ISO
2022. The left-hand part of JIS X0201 is the Japanese national variant
of ISO 646 and is identical to ASCII except for backslash (which is
replaced by yen mark), and tilde (replaced by macron).

Do whatever you like with this, but my personal opinion is that it
should not go into RFC-XXXX, as I have stated before. I believe that a
separate RFC (or whatever) should be written.


Regards,

Erik M. van der Poel                                      
erik(_at_)sra(_dot_)co(_dot_)jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692