ietf
[Top] [All Lists]

Re: APPSDIR review of draft-farrell-decade-ni-07, major design issue (one or two URI schemes)

2012-06-12 04:59:31

Hi Martin,

On 06/12/2012 10:13 AM, "Martin J. Dürst" wrote:
Hello Stephen,

This mail responds to your points on the main technical issue that I
have identified.

On 2012/06/05 20:11, Stephen Farrell wrote:

On 06/05/2012 10:42 AM, "Martin J. Dürst" wrote:
Hello everybody,

[For replies, please trim the cc list, thanks!]

Done, removed apps-discuss(_at_)ietf(_dot_)org for the moment.


Major design issue:

The draft defines two schemes, which differ only slightly, and mostly
just gratuitously (see also editorial issues).
These are the ni: and the nih: scheme. As far as I understand, they
differ as follows:
                                     ni:                nih:
authority:                          optional           disallowed
ascii-compatible encoding:          base64url          base16
check digit:                        disallowed         optional
query part:                         optional           disallowed
decimal presentation of algorithm:  disallowed         possible

I'll note in passing that the two schemes differ in all those
respects. You may disagree with our design, but basically you're
showing that the two differ in pretty much all possible ways
other than that both include a hash value.


The usability of URIs is strongly influenced by the number of different
schemes, with the smaller a number, the better. As a somewhat made-up
example, if the original URIs had been separated into httph: for HTML
pages and httpi: for images, or any other arbitrary subdivision that one
can envision, that would have hurt the growth and extensibility of the
Web. Creating new URI schemes is occasionally necessary, and the ideas
that lead to this draft definitely seem to warrant a new scheme (*), but
there's no reason for two schemes.
[(*) I know people who would claim the the .well-formed http/https thing
is completely sufficient, no new scheme needed at all.]

More specifically, if the original URIs had been separated into httpm:
(for machines) and httph: (for humans), the Web for sure wouldn't have
grown at the speed it did (and does) grow. In practice, there are huge
differences in human 'speakability' for URIs (and IRIs, for that
matter); compare e.g. http://google.com with
http://www.google.co.jp/#sclient=psy-ab&hl=en&site=&source=hp&q=hash&oq=hash&aq=f&aqi=g4&aql=


(which I have significantly shortened to hopefully eliminate potential
privacy issues), or compare the average mailto: URI with the average
data: URI. However, what's important is that there never has been a
strong dividing line between machine-only and human-only URIs or
schemes, the division has always been very gradual. Short and mainly
human-oriented URIs have of course been handled by machines, and on the
other hand, very long URIs have been spoken when really necessary.
"Speakability" has been maintained to some extent by scheme designers,
and to some extent by "survival of the fittest" (URIs that weren't very
speakable (or spellable/memorizable/guessable/...), and their Web sites,
might just die out slowly).

It should also be noted that the resistance against multiple URI schemes
may have been low because there are so many different ways to express
hashes in the draft anyway, and one more (the nih: section is the last
one before the examples section) didn't seem like much of a deal
anymore. But when it comes to URIs, one less is a lot better than one
more.

In the above ni:/nih: distinction, nih: seems to have been added as an
afterthought after realizing that reading an ni: URI aloud over the
phone may be somewhat suboptimal because there is a need for repeated
"upper case" - "lower case" (sure very quickly shortened to "upper" -
"lower" and then to "up" - "low" or something similar). It is not a bad
idea to try to make sure that IETF technology, and URIs in particular,
are accessible to people with certain kinds of dislexya. (There are
indeed people who have tremendous difficulties with distinguishing
upper- and lower-case letters, and this may or may not be connected with
other aspects of dislexya.) It is however totally unclear to this
reviewer why this has to lead to two different URI schemes with other
gratuitous differences.

Finding a solution is rather easy (of course, other solutions may also
be possible): Merge the schemes, so that authority, check digit, and
query part are all optional (an authority part and/or a query part may
very well be very useful in human communication, and a check digit won't
hurt when transmitted electronically) and the decimal presentation of
the algorithm is always allowed, and use base32
(http://tools.ietf.org/html/rfc4648) as the encoding. This leads to a
16.6% less efficient encoding of the value part of the ni: URI, but
given that other URI-related encodings, e.g. the %-encoding resulting
when converting an IRI to an URI, are much less efficient, and that URI
infrastructure these days can handle URIs with more than 1000 bytes,
this should not be a serious problem. Also, there's a separate binary
format (section 6) that is more compact already.

I strongly disagree with merging ni&  nih. Though that clearly
could be done, it would be an error.

There was no such comment on the uri-review list and the designated
expert was happy. That review was IMO the time for such comments
and second-guessing the designated expert at this stage seems
contrary to the registration requirements. So process-wise I
think your main comment is late.

First, if IETF Last Call is too late to make serious technical comments
on drafts, then I think we have to rename it to IETF Too-Late Call.

Second, designated experts are there to check for minimum requirements
for a registration, and to give advice as they see fit (and have time).
I'm myself a designated expert on "Character Sets", and I have
definitely in the past approved, and would again in the future approve,
registrations for stuff on which I would complain strongly if the
question was "is this a good technical solution".

Graham Klyne, the designated expert for URI scheme registrations, has
confirmed offline that he does not see his role as "expert reviewer" as
judging the technical merit of a URI scheme proposal.

While that's fair enough. Its also fair to note that there was
discussion of the this document on the uri-review list but this
aspect was not raised at all. That list is called "uri-review"
and from its archives it does seem to frequently do more than
just check the paperwork (including quite a few mails from you:-).

But in any case, I also think you're wrong technically in this>> case.

Let's see. I hope we agree that we should come to a conclusion on this
issue on technical merits, rather than on process details.

Sure.

nih *is* intended for a corner case, 

Let me emphasise the above. nih is not intended to be used
broadly, nor often. If you want a hash-based URI scheme for
users to speak that is for broad frequent use then I think
you are free to try design one. But nih is not that and is
not intended to be that. (And I'm not sure such a beast
could really be done well.)

where humans need to speak these
URIs and was added as a direct result of requirements from the core
WG and not as an afterthought. ni URIs are not intended for that
and so there really are IMO different requirements, (esp. e.g.
checkdigit) that are best met with different schemes.

I agree that the value of a checkdigit is very limited for communication

s/very limited/useless/

among machines (and for communication among humans with the help of
machines, such as in the case of email).

On the other hand, I can't understand why (even assuming we needed a
separate scheme) there is no authority and no query part on nih.

The main intent of nih is to allow entry of something that
confirms something else (e.g. a public key) that is already
present. There is no need for an authority for that, for
the use-cases we have. We could speculate about other potential
use-cases but we'd rather not speculate like that when there's
no need to.

For the authority, I'd assume that it would be as useful when the URI is
transmitted e.g. over the phone as when it is transmitted e.g. over email.

We don't have a use for that that I know about. I agree
it could be done, but then I think it'd also impact on
usability, which will be pretty crap no matter what's
done. But making usability worse also seems wrong. Not
having an authority also seems to work fine for PGP keys
and the lack of an authority does get rid of some threats,
if the nih URI is used for something security-sensitive.

For the query part, there are already various ideas and proposals
floating around, 

Where? If you mean draft-hallambaker-decade-params then
we (the authors of that) don't think those are useful for
nih names.

and at least some of them would be of interest for when
the URI is transmitted e.g. over the phone. Also, even if we currently
didn't have any actual proposals for query parameters, I think it would
be a very bad idea to exclude them a priori for transmission e.g. over
the phone.

I disagree that there is any "very bad idea" here.

Merging ni/nih would also add more complexity for no benefit,
which would be a bad idea.

Can you please explain what kind of complexity would have to be added?

I think its obvious actually. In your table above you highlighted
5 ways in which ni and nih differ. Merging all those yields loads
of combinations, which makes for complexity.

In terms of specification, merging the two schemes doesn't seem to be
difficult or complex at all. Also, in terms of implementation, the only
additions to the ni: scheme that become necessary are the check digit
and the expression of the "suite id" as a decimal. It's very difficult
for me to imagine that this would add significant complexity to an
implementation; if code for nih: exists, that can mostly just be moved
over.

Feel free to look at our code. (With the caveat that I'm a crap
programmer so close your eyes a bit when you look at the 'C' code:-)

Your analogy about httpm/h may appear reasonable, but it is always
unreasonable to draw conclusions from analogies. It is also unwise
to reason from counterfactuals, which we'd also be doing if we
accepted your argument. So I find that speculation utterly useless
to be honest.

It is definitely unreasonable to draw conclusions from analogies *only*.

I only saw the analogy. What in your httpm/h argument is not
couterfactual analogy?

But if you think that the httpm/h analogy is wrong, and that ni/nih is
different, could you please explain *what* is different?

We have real use cases for ni and nih and we think they differ.
I'd be repeating myself to say why again.

In this case, we are dealing with different requirements so this
should stay as-is.

If "different requirements" is your main (or only) real argument, 

That would be a valid argument.

could
you at least explain exactly how they are different? 

I did that above. Asking for an "exact" explanation seems
like asking the same thing again.

Just that one
requirement came from the core WG and others from other WGs or other
parties doesn't help me to understand how the actual requirements
differ. (Please note that even if the requirements differ, that doesn't
mean that we need different technology to address them.)

Perhaps not. But that was the design choice we made and its
a valid one.

Why do you say that ni: URIs are not intended for humans to speak? 

So phone me up and say this:

  ni:///sha-256;UyaQV-Ev4rdLoHyJJWCi11OHfrYv9E1aGQAlMO2X_-Q

What
am I supposed to do if I got an ni: URI in a mail message and call you
on the phone to tell you about that? 

Not my problem actually, but I guess most people might
say "remember that mail you sent me with all that gobbledygook
nonsense - what the hell was that about?" :-)

 If I want to send somebody the
information in an ni: URI by mail, should I use only the ni: version or
only the nih: version, or both, if I can't exclude that the recipient
may want to relay this information via voice?

You can try either and let me know what works. This seems like
a very artificial use-case for ni.

Finally, we have (some, early,) running code that matches the
current draft and that ought also count for something

How much? 

Feel free to go look and see. [1] I've not counted lines of
code, but we have c, python, ruby and clojure library
implementations and some apps and other bits and pieces.

   [1] http://sourceforge.net/projects/netinf/

The boiler plate on every ID is pretty clear that they are not
set in stone. Also, the changes needed to merge the two schemes are not
rocket science, quite to the contrary. (I herewith volunteer to fix the
Ruby version, just to show)

I didn't say our code is set in stone. I said that running code
counts.

I didn't say a merge would require rocket science. I said it'd be
a bad idea and would produce a worse result.

when compared
to a change that would be a gratuitous dis-improvement

In what sense would merging the two schemes be a dis-improvement? Can
you please explain?

I believe I did that above.

based it
seems upon dubious argument

If you think that my arguments are dubious, please explain exactly why.

I believe I did that above. (To be clear: not all your argument is
dubious but the httph/m part is IMO.)

that is also offered at the wrong
point in the process.

See above. If there's something wrong with IETF Last Call, or with the
fact that the Apps Area Directorate does reviews (which I don't think),
then that should be addressed separately. For this discussion, I hope we
can concentrate on technical issues.

Right. But let's not ignore the fact that the uri-review
list had sight of this at the end of April.

Bottom line - we have use-cases and a valid design and running
code that as far as we know works and I see no reason to make
the change you'd like, which would make thing worse IMO.

Cheers,
S.




Regards,   Martin.