pem-dev
[Top] [All Lists]

Re: DirectoryString character set

1994-03-01 13:01:00
Ella Gardner was kind enough to send a troff file containing
the following information, which I have edited for readability:

Printable String:


Capital letters A,B,...Z
Small letters a,b,...z
Digits#0,1,...9
Space (space)
Apostrophe '
Left parentheses (
Right parentheses )
Plus sign +
Comma ,
Hyphen -
Full stop .
Solidus /
Colon :
Equal sign =
Question mark ?

She also informs me that:

"I don't have 10646, but Universal String is a 32-bit character set and can
handle Kanji, etc. Hoyt Kesterson said that the Base Multinational
Plane (BMP) set, which is a 16-bit character set satisfies, all but the
Taiwanese. I don't know of anyone who has implemented Universal String.
I think Unicode is similar or the same as BMP, and maybe some folks have
implemented that."

"Did you receive my mail saying that a defect will be submitted correcting
Universal String to BMP String in the Directory standard?"

Richard Ankney was kind enough to send me the tables from T.61 (1984).
The notes clarify some of the illegible blocks on the table I received
from ANSI regarding alphanumeric strings. I'll therefore correct a couple of
errors on my previous note:

The instructions for how to register a name with ANSI
specify that "the characters used for alphanumeric values must be
taken from the set defined in registration 102, the Teletex Set of Primary 
Graphic Characters of the ISO Internaitonal Register of Coded 
Character Sets to be used with Escape Sequences plus space. The Escape
Sequences follow:

G0:   ESC 2/8       7/5
G1:   ESC 2/9       7/5
G2:   ESC 2/10     7/5
G3:   ESC 2/11     7/5
C0:         -
C1:         -

A copy of the allowable characters is as found in the Teletex
Primary Set of Graphic Character Sets is attached. Please note,
the international currency symbol (position 02/04) is not supported."

The attached figure includes the following characters:

[space] ! " (note 4] [note 4] % & ' ( ) * + , - . /

0 1 2 3 4 5 6 7 8 9 : ; < = > ?

@ A B C D E F G H I J K L M N O

P Q R S T U V W X Y Z [ [nochar] ] [nochar] [note 1 _]

[nochar] a b c d e f g h i j k l m n o

p q r s t u v w x y z [nochar] | [nochar] [nochar] [nochar]

Note 1: "When interworking with Videotex, this code shall have 
the meaning _delimiter_."

Note 4: "Teletex terminals should only send the codes 10/6 
and 10/8 for graphic characters [not equal] and [lozenge]. 
When receiving codes 2/3 and 2/4 terminals should interpret 
them as # and [lozenge]. [Position 2/4 is the international currency
symbol - RRJ]

(At this time I do not know what the escape sequences are, or how
they are supposed to be used.)

The T.61 text goes on to say,

"The supplementary set contains 13 diacritical marks that are used 
in combination with the letters of the basic Latin alphabet in the primary 
set to constitute the coded representations of accented letters and umlauts.
these diacritical marks, and their coded representations, are:

Acute accent 12/2
Grave accent 12/1
Circumflex accent 12/3
Diaersis or umlaut mark 12/8
Tilde 12/4
Caron 12/15
Breve 12/6
Double acute angle 12/13
Ring 12/10
Dot 12/7
Macron 12/5
Cedilla 12/11
Ogonek 12/14"

(Richard says that the diacritical marks are nonspacing characters which 
preceed the spacing characters.)

Finally, in addition to these characters there are characters for inverted 
exclamation point, cent, pound sterling, dollar, yen, pound, section symbol, 
lozenge, 
<<, degree, plus-minus, superscrit-2, superscript-3, times, micron, paragraph 
symbol,
middle-dot, divide sign, >>, 1/4, 1/2, 13/4 inverted question mark, and some 32 
miscellaneous characters and dipthongs for languages like Icelandic, German, and
French.

Rich also informs me that Kanji, Chinese, Cyrillic, Korean, and Greek are 
supposed 
to be defined in T.61 (1988). These alternate character sets are invoked using
escape and shift sequences (documented elsewhere in T-61, presumably),
but he says that "X.208 doesn't allow any additional registration numbers to be 
invoked..."

So at present, it appears that ANSI will  only register names containing 
characters from the primary set, without the diacritical marks or special 
characters such as dollar sign, despite the fact that X.500 would presumably 
allow the additional secondary characters and diacritical marks within a 
DirectoryString.

I'm going to talk to someone at ANSI to confirm this -- this seems unfortunate, 
if true.

Bob

<Prev in Thread] Current Thread [Next in Thread>