John Klensin writes:
Assuming that copyright
regulations, etc., permit, an informational document should be submitted
immediately to the RFC editor for publication that contains the "real"
specification of what we describe as 2022-jp but which, if I understand
things correctly, is actually a Japanese (JIS) National Standard.
Actually, it is a JUNET "standard", not a JIS standard, though the
former refers to the latter, of course.
Presumably that document is in Kanji
Yes, the JUNET document is in Kanji. I have a copy of the first
edition in front of me. The cover says (in romanized Japanese):
JUNET Riyou No Tebiki
(Dai Ippan)
1988 Nen 2 Gatsu
JUNET Riyou No Tebiki Sakusei Iinkai
English translation of the above:
JUNET User's Guide
(First Edition)
February 1988
JUNET User's Guide Drafting Committee
One of my colleagues says that the Second Edition seems to be in the
works, but I would guess that this isn't really relevant, since the
current usage of 2022 is based on what was printed in the first
edition anyway.
So, here is a translation of part of a section of that document. I
submit this purely "for your information". The original doesn't have
any copyright (that I can find), but I doubt that they would mind
anyway. (Famous last words.)
START HERE:
6.3.1 Agreement on the Use of Kanji in JUNET
Regulations (1) through (11) are given here. Some of these must be
observed, while some are recommendations that should be observed.
(1) The "Standard JUNET Kanji Code" is the code used in communicating mail
and news on the network, and is not related to the code used locally
on each system or between parties with other prior agreements.
Messages that use the "Standard JUNET Kanji Code" are guaranteed, and
should be guaranteed, to be communicated properly across the network.
However, this document does not attempt to regulate messages sent
internally within an organization. However, the rules given here
should be followed for communication between organizations.
In other words, it is possible to use a different code internally, but
the internal code must be converted to the standard code before
sending messages out onto the network, and, conversely, incoming
standard code should then be converted to the local internal code as
appropriate.
Similarly, consenting parties may use a private code. This document
does not attempt to regulate the use of a non-standard code between
parties that have reached prior agreements and also know that their
link will allow such codes. However, if the receiver is not known and
correct communication of the private code has not been checked, the
standard code must be used.
(2) The code used for communication is not only a domestic matter, so an
ISO-conformant code is used. (Strictly speaking, a subset of the
character sets known as graphic character sets is used. For more
details, see the section at the end of this chapter.)
The following are used:
JIS X0201 (C6220) 7-bit codes
JIS X0208 (C6226) Kanji codes
JIS X0202 (C6228) code extension techniques
(3) Announcers are omitted entirely. Messages are handled as though the
announcers were omitted.
Currently, it is allowed to use either of the following Kanji escape
sequence pairs:
ESC $ @, ESC ( J
ESC $ B, ESC ( B
However, the following is not used:
ESC ( H
This should be changed to a different sequence.
Note:
In order to determine the escape sequences you are using, create a
small Kanji file and look at it with a tool such as od (octal dump).
For example, say you created a file called temp that contains the 5
Hiragana characters "aiueo".
Unix: od -c temp
0000000 033 $ @ $ " $ $ $ & $ ( $ * 033 ( J
--------- ---------
If you obtain the above, your system uses JIS for the Kanji code, and
the escape sequences ESC-$-@ and ESC-(-J are used.
Unix: od -c temp
0000000 202 240 202 242 202 244 202 246 202 250 \n \n
If you obtain the above, your system is using Shift-JIS. You may not
send this as is to the outside world. Please talk to your system
administrator.
All of the 4 escape sequences must be communicated intact.
For example, you are free to treat ESC $ B as though it were ESC $ @
locally, but you may not change any ESC $ B sequences to ESC $ @ when
relaying mail or news.
(However, the escape sequence does not have to be preserved between
consenting hosts. Also, it is noted here that there has been the
opinion that changing the escape sequence is not a pressing problem.)
(4) At the end of a file, ASCII / JIS Roman is selected and then a newline
is added. In other words, at the end of a message one must select
English (i.e. quit Kanji), and one must add a newline (to finish the
line).
END HERE.
Sorry about the abrupt end (there's more but no more time today).
By the way, the JIS X0202 is a Japanese translation of (part of?) ISO
2022. The left-hand part of JIS X0201 is the Japanese national variant
of ISO 646 and is identical to ASCII except for backslash (which is
replaced by yen mark), and tilde (replaced by macron).
Do whatever you like with this, but my personal opinion is that it
should not go into RFC-XXXX, as I have stated before. I believe that a
separate RFC (or whatever) should be written.
Regards,
Erik M. van der Poel
erik(_at_)sra(_dot_)co(_dot_)jp
Software Research Associates, Inc., Tokyo, Japan TEL +81-3-3234-2692