ietf
[Top] [All Lists]

Re: [saag] i18n requirements (was: Re: NF* (Re: PKCS#11 URI slot attributes & last call))

2015-01-13 16:37:09
On Mon, Jan 12, 2015 at 10:41:01PM -0500, John C Klensin wrote:
--On Monday, January 12, 2015 18:08 -0600 Nico Williams
<nico(_at_)cryptonector(_dot_)com> wrote:
Well alright.  I'd love to see a set of guidelines for I18N
activities.

So would we all.  RFC 2277 was supposed to provide some guidance
but is now badly obsolete in many different ways, including
exhibiting how little we knew about some things at the time.  We
have, I hope, learned a lot, but see below.

When should we try to support Unicode, and when should we not?
Is it one of those "I know it when I see it" kinds of
guidelines?  That wouldn't be useful enough :(

Let me suggest a general way of thinking about things -- maybe
not quite a "guideline".  Especially for security-type
protocols, make sure there is a substantive reason, presumably
connected to users and user experience, for it to be necessary
to go beyond ASCII.  I really do mean "necessary": if it is just
a good idea in principle or a maybe-nice-to-have or "maybe
someone will want this some day", skip it because adding i18n
capabilities _will_ make correct and predictable implementations
more difficult and _will_ increase the number and range of
attack opportunities.   

Yes, I18N is all about UIs and the UX.

Clearly, if a character string isn't a UI element, and is never a
visible aspect of the UX, then it is a great candidate for being made
US-ASCII only.  Indeed, we *should* make all such strings US-ASCII only.

That much is obvious, and whether or not something is part of the UI is
an objective measure with relatively little room for doubt.

But there are UI elements that could reasonably be constrained to
US-ASCII (because the world over, people manage to deal with US-ASCII
character strings in various parts of their UIs).  The tricky part is
deciding what UI elements (or things leaking into them) qualify.

For example, a "manufacturer" name in PKCS#11 could reasonably be
constrained to US-ASCII only.  Right?  Well, maybe a French -say-
manufacturer might object.

An interesting distinction here might be: name or identifier?
Identifiers (appearing in UIs) -> US-ASCII.  Names -> Unicode.

Token and object labels seem a lot like identifiers in the use cases I
expect.  But I can't be certain that they would never be expected to
contain names.

Manufacturer names really are names, no?

These are decisions that we can make that can anger people who are not
participating here today.

Mind you, IIRC PKCS#11 didn't even say anything about ASCII
before. Token labels and such used to be fixed-sized octet
strings containing character data.  Jan can correct me if I'm
wrong.  I'm not sure even saying "ASCII-only" would
necessarily be safe in that case...

And that reinforces my view that the real, underlying, problem
here has to be fixed in PKCS#11, not in anything the IETF puts
on top of it.  Only they can fix the problems; we can, at best,
mitigate the damage.

Yes.

But look, PKCS#11 is a thing with a low count of character strings.
Mostly things will be looked for with equivalence semantics, and
form-insensitive Unicode string comparison will do for that (at the
expense of having the code for it), as will plain old octet string
comparison (because we can expect happy input method output form
agreement accidents).

I think Jan's text is fine.  I don't mean to belabor this thread.
I'm now only commenting on the more general matter of when we should be
happy to settle for less than the full I18N treatment.

Fortunately the OASIS PKCS11 TC has clarified that these are
UTF-8; unfortunately they left other I18N details out.

It appears to me that what they have said puts their level of
understanding of the various issues somewhat behind where we
were when RFC 2277 was written in 1997.  

Yes, but it's also fair to note the above, that this is the sort of case
where a low-effort I18N ("say UTF-8; say nothing about anything else")
seems likely to be good enough for most implementors and users.

Nico
-- 

<Prev in Thread] Current Thread [Next in Thread>