Re: Last Call: <draft-faltstrom-5892bis-04.txt> (The Unicode code points



--On Sunday, May 29, 2011 08:58 +0200 Simon Josefsson
<simon(_at_)josefsson(_dot_)org> wrote:

in a Unicode 6.0 environment, evaluate U+19DA as PVALID and
therefore not raise that error, then it is not "compliant"
with RFC 5892, irrelevant of the "Updates" status of the
present document.


I don't see how.

My code uses the tables from RFC 5892 which were generated in
an Unicode 5.2 environment.  My IDNA2008 code may eventually
run in an Unicode 6.0 environment, or any other future version
of Unicode.  I can't control the Unicode version used, and
from what I understand this is one of the features of
IDNA2008.  Implementations need not lock down the Unicode
version to a single Unicode version, as they had to do for
IDNA2003.


It seems to me that this is exactly where we are having a
misunderstanding.   In terms of determining conformance, those
tables are not normative, so it is not possible to say "I
implemented the tables in RFC 5892 and therefore I conform to
the standard".  The closest you can get would be to say "I
implemented the rules and tested against the tables when those
rules were applied to Unicode 5.2 and therefore have great
confidence in my implementaton", but conformance statements stop
with "implemented the rules correctly".  

For practical reasons, we expect to see production
implementations using tables or other abstractions of the rules
that are somewhat pre-compiled, not applying the rule set each
time.   One consequence of this is that a given table-based
implementation is inevitably dependent on versions of Unicode
even if the Standard (and its conformance requirements) is not.
That would be true even if the type of change (correction) that
occurred with version 6.0 of Unicode had not occurred. It would
still be necessary to construct version-dependent tables to deal
with newly-assigned code points.

From the perspective of those who argued that the document

titled "...5852bis.." should not be produced and published
because it is unnecessary, the point is that we would not have
generated the document at all had the only changes been the
addition of new PROTOCOL-VALID and DISALLOWED code points by
virtue of new code points being added to Unicode.  But, in
practical terms, that is a much greater change to an
implementation than anything related to these few characters
with changed properties.

And, again, this situation would be true of virtually any
specification that depends on Unicode, regardless of whether the
definition is in terms of  rules/properties or tables. There
would be an exception if the specification depended on code
point assignments alone and was okay with treating unassigned
code points as if they had been assigned if they turned up in
the data stream (IDNA2003 attempted to lay the foundation for
the latter but failed because all of the properties that an
unassigned code point will have when it is assigned cannot be
known).  For anything else, working properly with a given
version of Unicode requires updating of code point tables,
normalization tables, and assorted property tables.   As Mark
points out, defining things in terms of the tables, with the
rules providing only guidance, has some important advantages in
this regard.  However, it guarantees the need to talk about
conformance to a Unicode version, not just "Unicode".

If this model is not permitted, I believe there are bigger
problems.

To avoid doubt, and to back up your assertment that my
implementation is non-compliant, please point to the "MUST" or
"SHOULD" in RFC 5892 that forbis this, to me, logical
implementation approach.


The key is the text in Section 4 that says:

        "The table in Appendix B shows, for illustrative
        purposes, the consequences of the categories and
        classification rules, and the resulting property values.
        
        "The list of code points that can be found in Appendix B
        is non-normative.  Sections 2 and 3 are normative."

It seems to me that is very clear about the relationship between
the rules and the tables.   That relationship is reiterated in
Section 7.1.1 of RFC 5892.

You could reasonably say that your implementation is conformant
but current only to Unicode 5.2.   If you are willing to say
that, I guess you don't need to change anything.   While we
recognize that you have no control over the Unicode version in
use, good sense suggests that systems will update versions of
Unicode (including all of the associated tables and support
routines as applicable) and versions of your library together,
While that should be clear from the context of the discussions
in RFC 5891 and 5892, RFC 5894 is quite explicit about it in the
second bullet of Section 7.1.2:

 "o The Unicode tables (i.e., tables of code points,
        character classes, and properties) and IDNA tables
        (i.e., tables of contextual rules such as those
        that appear in the Tables document), must be
        consistent on the systems performing or validating
        labels to be registered.  Note that this does not
        require that tables reflect the latest version of
        Unicode, only that all tables used on a given
        system are consistent with each other."

Similarly, the first bullet of 7.1.3 reads:

 "o Maintain IDNA and Unicode tables that are consistent
        with regard to versions, i.e., unless the application
        actually executes the classification rules in the Tables
        document [RFC5892], its IDNA tables must be derived from
        the version of Unicode that is supported more generally on
        the system.  As with registration, the tables need not
        reflect the latest version of Unicode, but they must be
        consistent."

I hope that helps.

best,
     john






_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/ietf

Re: Last Call: <draft-faltstrom-5892bis-04.txt> (The Unicode code points and IDNA - Unicode 6.0) to Proposed Standard