Ella Gardner was kind enough to send a troff file containing
the following information, which I have edited for readability:
Printable String:
Capital letters A,B,...Z
Small letters a,b,...z
Digits#0,1,...9
Space (space)
Apostrophe '
Left parentheses (
Right parentheses )
Plus sign +
Comma ,
Hyphen -
Full stop .
Solidus /
Colon :
Equal sign =
Question mark ?
She also informs me that:
"I don't have 10646, but Universal String is a 32-bit character set and can
handle Kanji, etc. Hoyt Kesterson said that the Base Multinational
Plane (BMP) set, which is a 16-bit character set satisfies, all but the
Taiwanese. I don't know of anyone who has implemented Universal String.
I think Unicode is similar or the same as BMP, and maybe some folks have
implemented that."
"Did you receive my mail saying that a defect will be submitted correcting
Universal String to BMP String in the Directory standard?"
Richard Ankney was kind enough to send me the tables from T.61 (1984).
The notes clarify some of the illegible blocks on the table I received
from ANSI regarding alphanumeric strings. I'll therefore correct a couple of
errors on my previous note:
The instructions for how to register a name with ANSI
specify that "the characters used for alphanumeric values must be
taken from the set defined in registration 102, the Teletex Set of Primary
Graphic Characters of the ISO Internaitonal Register of Coded
Character Sets to be used with Escape Sequences plus space. The Escape
Sequences follow:
G0: ESC 2/8 7/5
G1: ESC 2/9 7/5
G2: ESC 2/10 7/5
G3: ESC 2/11 7/5
C0: -
C1: -
A copy of the allowable characters is as found in the Teletex
Primary Set of Graphic Character Sets is attached. Please note,
the international currency symbol (position 02/04) is not supported."
The attached figure includes the following characters:
[space] ! " (note 4] [note 4] % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ [nochar] ] [nochar] [note 1 _]
[nochar] a b c d e f g h i j k l m n o
p q r s t u v w x y z [nochar] | [nochar] [nochar] [nochar]
Note 1: "When interworking with Videotex, this code shall have
the meaning _delimiter_."
Note 4: "Teletex terminals should only send the codes 10/6
and 10/8 for graphic characters [not equal] and [lozenge].
When receiving codes 2/3 and 2/4 terminals should interpret
them as # and [lozenge]. [Position 2/4 is the international currency
symbol - RRJ]
(At this time I do not know what the escape sequences are, or how
they are supposed to be used.)
The T.61 text goes on to say,
"The supplementary set contains 13 diacritical marks that are used
in combination with the letters of the basic Latin alphabet in the primary
set to constitute the coded representations of accented letters and umlauts.
these diacritical marks, and their coded representations, are:
Acute accent 12/2
Grave accent 12/1
Circumflex accent 12/3
Diaersis or umlaut mark 12/8
Tilde 12/4
Caron 12/15
Breve 12/6
Double acute angle 12/13
Ring 12/10
Dot 12/7
Macron 12/5
Cedilla 12/11
Ogonek 12/14"
(Richard says that the diacritical marks are nonspacing characters which
preceed the spacing characters.)
Finally, in addition to these characters there are characters for inverted
exclamation point, cent, pound sterling, dollar, yen, pound, section symbol,
lozenge,
<<, degree, plus-minus, superscrit-2, superscript-3, times, micron, paragraph
symbol,
middle-dot, divide sign, >>, 1/4, 1/2, 13/4 inverted question mark, and some 32
miscellaneous characters and dipthongs for languages like Icelandic, German, and
French.
Rich also informs me that Kanji, Chinese, Cyrillic, Korean, and Greek are
supposed
to be defined in T.61 (1988). These alternate character sets are invoked using
escape and shift sequences (documented elsewhere in T-61, presumably),
but he says that "X.208 doesn't allow any additional registration numbers to be
invoked..."
So at present, it appears that ANSI will only register names containing
characters from the primary set, without the diacritical marks or special
characters such as dollar sign, despite the fact that X.500 would presumably
allow the additional secondary characters and diacritical marks within a
DirectoryString.
I'm going to talk to someone at ANSI to confirm this -- this seems unfortunate,
if true.
Bob