Blake Ramsdell wrote:
On Monday, February 16, 1998 1:08 PM, Phillip H. Griffin
[SMTP:asn1(_at_)mindspring(_dot_)com] wrote:
Type UTF8String has UNIVERSAL 12 as its tag. X.680 states
that "In the value notation all BMPString values are valid
UniversalString and UTF8String values", and notes that the
notation for defining individual character values is the
same for these types.
If this is the case, and UTF8String has UNIVERSAL 12 as its tag, I
recommend that we use that tag instead of OCTET STRING to package our
UTF8 strings. I don't know what PKIX is doing regarding this, but I
suspect that they should do the same thing.
As far as profiling UTF8String (should we constrain this to ISO 10646-1
UCS-2 == BMPString == UNICODE), suggestions are welcome.
The following definition:
BMP ::= UTF8String (FROM({0, 0, 0, 0}..replacementCharacter))
would do the trick.
I might mention, just for most folks that might find it odd that
ASN.1 *is* sometimes simple (:-0), that to define any individual
national language characters, say for test purposes, the
following ASN.1 value notations each produce distinct
encodings of exactly the same values:
aval BMPString ::= {0, 0, 4, 56}
aval UTF8String ::= {0, 0, 4, 56}
aval UniversalString ::= {0, 0, 4, 56}
If you consider, say, characters from a recent informative
RFC on the KOI8-U Ukrainian character set, you could define
such values by coding:
cyrillicSmallLetterEn UTF8String ::= {0, 0, 4, 61} -- 206 CE U043D
cyrillicSmallLetterO UTF8String ::= {0, 0, 4, 62} -- 207 CF U043E
cyrillicSmallLetterPe UTF8String ::= {0, 0, 4, 63} -- 208 D0 U043F
or
cyrillicSmallLetterEn BMPString ::= {0, 0, 4, 61} -- 206 CE U043D
cyrillicSmallLetterO BMPString ::= {0, 0, 4, 62} -- 207 CF U043E
cyrillicSmallLetterPe BMPString ::= {0, 0, 4, 63} -- 208 D0 U043F
Note here that the comments relate to the tables in the appendix
of draft-rfced-info-koi8-u-03.txt, and that the "U" numbers point
to the Unicode character numbers referenced in that draft. So it
is possible to use such value notation to generate test data for
all of the world's languages.
For your favorite visible ASCII (which I note is an acronym that
never appears in the ASN.1 standards) character, the following
example is provided in X.680:
space BMPString ::= {0, 0, 0, 32}
exclamationMark BMPString ::= {0, 0, 0, 33}
quotationMark BMPString ::= {0, 0, 0, 34}
... -- and so on
tilde BMPString ::= {0, 0, 0, 126}
Note again that you can substitute UTF8String in all
of these definitions for BMPString (unicode).
It is also quite easy to create your own language types
that are constrained to a given specific permitted alphabet
that some coder tools will enforce. An old copy of the ASN.1
CHARACTER SET MODULE lists the following for example:
BasicLatin ::= BMPString (FROM(space..tilde))
Hangul ::= BMPString
(FROM(hangulSyllableKiyeokA..hangulSyllableHieuhIIeung))
Katakana ::= BMPString (FROM({0, 0, 48, 160}..{0, 0, 48, 255}))
Bopomofo ::= BMPString (FROM({0, 0, 49, 0}..{0, 0, 49, 47}))
Bmp ::= BMPString (FROM({0, 0, 0, 0}..replacementCharacter))
Notice that you can use defined character names like "space",
if you've defined such a character, or you can simply use
numbers.
Phil
Blake
--
Blake C. Ramsdell
Worldtalk Corporation
For current info, check http://www.deming.com/users/blaker
Voice +1 425 882 8861 x103 Fax +1 425 882 8060
--
Phillip H. Griffin Griffin Consulting
asn1(_at_)mindspring(_dot_)com ASN.1-SET-Java-Security
919.828.7114 1625 Glenwood Avenue
919.832.7008 [mail] Raleigh, North Carolina 27608 USA
------------------------------------------------------------
Visit http://www.fivepointsfestival.com
------------------------------------------------------------