Re: [precis] Secdir last call review of draft-ietf-precis-7564bis-07

Hi John,

Thanks for your note, and my apologies for the slow reply. Comments inline.

On 6/13/17 7:02 PM, John C Klensin wrote:



--On Tuesday, June 13, 2017 17:02 -0600 Peter Saint-Andre
<stpeter(_at_)stpeter(_dot_)im> wrote:

Hi Matt, thanks for the review - it's much appreciated.

Just so you know: through discussion of Daniel Migualt's
secdir review of 7700bis (we're progressing them all together
this time!), I realized that it might be help to add another
example of visually confusing characters to 7564bis, so I plan
to mention CYRILLIC SMALL LETTER A U+0430 vs. LATIN SMALL
LETTER A U+0061 (which will be more familiar to readers than
the Cherokee characters already in the document).


Peter,

I don't want to throw the proverbial spanner in the works, but,
just as things changes just as the original PRECIS documents
were being published, I wonder if some other things that appear
to be in process now could do it to us again.  

For example, consider draft-freytag-troublesome-characters.
Despite having contributed to it and expecting to continue to do
so, I've got some misgivings about the document and proposed
registry as IETF work but, if it were to be adopted, it seems to
me that it would be useful for the PRECIS documents to
normatively reference it, especially for Identifier Class.


Given that we're dealing with a seemingly tenuous hypothetical, the best
approach might be for that I-D (if eventually published as an RFC) to
update the relevant PRECIS and IDNA RFCs? We'll need to do that for the
IDNA RFCs anyway because they're not currently under revision, as the
PRECIS RFCs are.

 To
some extent, that draft is a remedy for some of the issues
raised in the long-stalled draft-klensin-idna-5892upd-unicode70,
but it doesn't make those issues, and the lack of
comprehensiveness of normalization, go away.


I'm not sure that anything could make those issues go away.

Probably less important, but it might be advantageous to
incorporate some of the "whatever decisions you make, people
will probably hold you accountable if there are problems" tone
of draft-klensin-idna-rfc5891bis into the PRECIS documents.  It
might even be that RFC 7940, possibly supplemented by
draft-freytag-lager-variant-rules, would be a better, or at
least useful alternative, way to present some or all of the
PEECIS rule sets than the current approach.


One question in my mind is whether an approach such as that of RFC 7940
is so much better that it's worth scrapping / rewriting the PRECIS bis
I-Ds along those lines. Right now it's not even clear what criteria we'd
use to judge "better" or "useful" here - presumably specification
clarity and precision, algorithmic completeness, and reduced error rates
in code implementations might factor into the decision. But I don't
sense that we have a good handle on making these decisions yet. Another
tradeoff here is making the relatively small fixes to the PRECIS RFCs in
a relatively short amount of time (measured in IETF years) vs. making a
larger overhaul in a longer amount of time (and whether there is
sufficient energy to do so). Given our track record in
internationalization, I'd prefer to get these PRECIS fixes done now and
then look at a larger effort.

On a somewhat different topic, the Greek, Latin,  and Cyrillic
scripts are so closely related that finding examples of pairs of
similar-looking characters is in the low-lying fruit category
because the similarities are not coincidences but the result of
derivation and extensive borrowing (something of the same thing
can be said for the Latin-Cherokee relationship, at least in
printed, rather than cureive, forms).


Indeed.

 The examples that may be
more scary, just because there is no evolutionary theory to
predict were to look, would be things like the resemblances
among the Latin U+006F, the Lao U+0ED0, the Ethiopic U+12D0, the
New Tai Lue U+19D0, and of course the ASCII/European digit
U+0030 and probably many more, with the group perhaps best
described as "open circle graphemes" or something like that.


Well, circles are common enough, so it's reasonable that they'd show up
in many different contexts as both letters and numbers (which is why we
have confusion between the letter "O" and the number zero even in the
basic Latin repertoire) and even as punctuation marks and symbols. But I
like the examples you've mentioned and will add them to 7564bis to
further illustrate the problem, all the while understanding full well
that a complete list of examples or an explanation of why such examples
are problematic is outside the scope of this specification (which is why
we point to UTR36 and UTS39).

Peter