Re: i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & l

On Thu, 8 Jan 2015, John C Klensin wrote:

On Sat, 3 Jan 2015, Jan Pechanec wrote:

     hi, I haven't received any other comments on the draft 
recently (I know the LC already ended on Dec 29 though) so I
think I  can file changes discussed and drafted in this thread
as draft 18 on  Friday.  Thank you all for feedback, I really
appreciate it.

     one more change for the draft 18 (v2 attached) is to spell 
"NFC" and reference the Unicode Annex on normalization based
on  comments from Jaroslav and Christian.
...


Jan,

I don't have a lot of time to spend on this and am not an expert
on either X.509 or PKCK (#11 or otherwise).  At least the first
may be unfortunate, but it is what it is.

        
        hi John, I very much appreciate time you already spent on 
this.  Please see my comments inline.

While I think the changes you have made are definitely
improvements, this i18n stuff is complicated.  As with Security,
there is a completely inadequate supply of magic pixie dust that
can be thrown at problems to make them go away.  "Normalize to
NFC" (with spelling-out and references) is a vast improvement or
"use [valid] UTF-8" but there are many other issues.  You have
noted some and omitted others.  For example, case-independent
matching is a very simple and completely deterministic issue for
ASCII (one essentially just masks off one bit within a certain
range), it can get very messy if one tries to be sensitive to
different locales that have different conventions about what to
do with diacritical marks when lower-case characters are
converted to upper case.  There are Unicode "CaseFold" rules


        I understand from the previous discussion that the topic is a 
very complex one and that my draft needed to acknowledge that.

<...>

I don't know how far in explaining this your document should go.
I would urge, as I think I did before, some fairly strong
warnings that, at least until the issues are clarified in
PKCS#11 itself, one should be very certain one knows what one is
doing (and what the consequences of one's choices will be) if
one decides to move beyond the safety and general understanding
of the ASCII/ ISO 646/ IA5 letter and digit repertoire.  That
sort of warning should supplement your NFC language, not replace
it-- neither is a substitute for the other.   Whether you
incorporate it or not, your I-D should not assume that, by
saying "NFC", you have somehow resolved the full range of issues
in this area, any more than saying "UTF-8" did.


        I understand that.  The note about spelling NFC was on top of 
the first changes I incorporated.  I don't know if you saw those, I 
know there were many emails and your time you could spend on this is 
very limited.  So, in section on URI matching, I tried to be very 
explicit and based the warning I added on one of your comments from 
the previous discussion:

+   As noted in Section 6, the PKCS#11 specification is not clear about
+   how to normalize UTF-8 encoded Unicode characters [RFC3629].  Those
+   who discover a need to use characters outside the ASCII repertoire
+   should be cautious, conservative, and expend extra effort to be sure
+   they know what they are doing and that failure to do so may create
+   both operational and security risks.  It means that when matching
+   UTF-8 string based attributes (see Table 1) with such characters,
+   normalizing all UTF-8 strings before string comparison may be the
+   only safe approach.  For example, for objects (keys) it means that
+   PKCS#11 attribute search template would only contain attributes that
+   are not UTF-8 strings and another pass through returned objects is
+   then needed for UTF-8 string comparison after the normalization is
+   applied.

        do you suggest a stronger warning than that?

        more on that was incorporated into a new Internationalization 
Condiderations section, based on new text drafted by Nico:

+6.  Internationalization Considerations
 
+   The PKCS#11 specification does not specify a canonical form for
+   strings of characters of the CK_UTF8CHAR type.  This presents the
+   usual false negative and false positive (aliasing) concerns that
+   arise when dealing with unnormalized strings.  Because all PKCS#11
+   items are local and local security is assumed, these concerns are
+   mainly about usability.
+
+   In order to improve the user experience, applications that create
+   PKCS#11 objects or label tokens SHOULD normalize labels to
+   Normalization Form C (NFC) [UAX15].  For the same reason PKCS#11
+   libraries, slots (token readers), and tokens SHOULD normalize their
+   names to NFC.  When listing PKCS#11 libraries, slots, tokens, and/or
+   objects, an application SHOULD normalize their names to NFC.  When
+   matching PKCS#11 URIs to libraries, slots, tokens, and/or objects,
+   applications MAY use form-insensitive Unicode string comparison for
+   matching, as those might pre-date these recommendations.  See also
+   Section 3.5.

        and a new paragraph was also added to the existing Security 
Considerations section:

+   The PKCS#11 specification does not provide means to authenticate
+   devices to users; it only allows to authenticate users to tokens.
+   Instead, local and physical security are demanded: the user must be
+   in possession of their tokens, and system into whose slots the users'
+   tokens are inserted must be secure.  As a result, the usual security
+   considerations regarding normalization do not arise.  For the same
+   reason, confusable script issues also do not arise.  Nonetheless, it
+   is best to normalize to NFC all strings appearing in PKCS#11 API
+   elements.  See also Section 6.

        I think these new paragraphs convey the message that users 
should very careful when using characters outside ASCII, and what to 
do to mitigate problems that can arise from such use.  Do you think 
more should be said in the draft itself?

For more information, you might have a look at some of the
PRECIS work, notably draft-ietf-precis-framework.

I also remain convinced that the best place to fix this is in
the PKCS#11 spec itself.  One is always at a disadvantage when
trying to work around an inadequate specification in a different
specification that has to depend on it and your work is no
exception.  I wish there were whatever liaison arrangements
between the IETF and others (presumably notably RSA) to be sure
that happened or at least there was clear awareness on the PKCS
side of the deficiencies.


        last week I did contact OASIS PKCS 11 TC which is where 
PKCS#11 moved to since 2013.  However, even if the issue is going to 
be fixed there I don't think it will be in new version 2.40 which is 
close to be published.

        Happy New Year to you, too.

        best regards, Jan.

-- 
Jan Pechanec <jan(_dot_)pechanec(_at_)oracle(_dot_)com>

Re: i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))