Re: How to do UTF-8

David P. Kemp wrote:

From: Paul Hoffman / IMC <phoffman(_at_)imc(_dot_)org>

And while we're UTF8'ing, we should also replace the DirectoryString in
the ContentHints attribute:

ContentHints ::= SEQUENCE {
 contentDescription UTF8String SIZE(1..MAX) OPTIONAL,
 contentType OBJECT IDENTIFIER }


I don't see any reason to do the SIZE(1..MAX). It made sense for the old
strings, but I don't think we need it here. Can we elide it?


I fail to see the difference between the usefulness of a 
size constraint to implementors based on "Type" of string. 
If size constraints are 'good', then they good regardless 
of string type.

I agree that an upper bound on the string size may not be useful (and


I believe knowing that the worse case string length is some
resonable size is far better than having to plan for a worse
case of a very large value. This is particularly true for
sting to be displayed for a user. Do we really want GUI
designers to make sure that they can handle strings that
are, say 32767 characters in length?

I believe that if we leave this value relatively open ended, 
implementors will pick a more resonable value. This may lead 
to interoperability problems, since each may pick a different 
value.

may be difficult to translate into a buffer size for variable-length


Nothing to it. The upper bound is simply the maximum
number of characters to expect. There may always be some 
slack in an implementation though for a given string, since 
each UTF8String character must be encoded in the smallest 
number of octets possible for a given character. Escape
sequences and announcers are not allowed.

Of course implementors are free to choose any buffer size 
they wish, but it would be arguably prudent to choose 3*max 
to handle the worst case. In a mostly ASCII environment you
waste some space, but you'd not likely crash and burn.

   NOTE: I'm 'fairly' certain that three is the max character
   length (2 for BMPString, 4 for Universal) but I was unable
   to find this absolutely specified in X.690 at the location
   in the standard where I expected it to be. If it's decided
   that a constraint should be used, this should be checked.

UTF8 characters), but this SIZE clause has a different purpose:
to force the string to have at least one character.
An omitted OPTIONAL variable length item (SEQUENCE OF, xxxString, etc)
can have two possible encodings - absent, or zero length.  By forcing


I like DavidK's argument here. The zero does make life a 
little more difficult for hand coders. A minimum of one 
character allows them to switch solely on present or absent.

the item in question to have at least one element, the encoding
ambiguity is eliminated.  I've gotten in the habit of including the
SIZE clause as boilerplate.


A very good habit.


I believe someone mentioned some time ago that this was a problem that
should be addressed in X.680/690, instead of in every application protocol.
But AFAIK it has not been addressed yet.


There is a method for handling this, which is made more clear in
ASN.1:1997. But it requires a much more formal process than usual
in the IETF, with registered performas and PICS, etc., and I do not
see this as possible in this environment. (I see it as significant 
that we're actually using formal modules in this work. This has got 
to be the nicest looking ASN.1 code I've ever seen come out of an 
IETF effort.)

PS. AFAIK really threw me. At first I thought it was some
standards body I'd never heard of. But I'm better now. :-)

Phil
-- 
Phillip H. Griffin         Griffin Consulting
asn1(_at_)mindspring(_dot_)com        ASN.1-SET-Java-Security
919.828.7114               1625 Glenwood Avenue
919.832.7008 [mail]        Raleigh, North Carolina 27608 USA
------------------------------------------------------------
          Visit  http://www.fivepointsfestival.com
                 http://www.five-points.com
------------------------------------------------------------