APPSDIR review of draft-farrell-decade-ni-07

Hello everybody,

[For replies, please trim the cc list, thanks!]

I have been selected as the Applications Area Directorate reviewer forthis draft (for background on appsdir, please seehttp://trac.tools.ietf.org/area/app/trac/wiki/ApplicationsAreaDirectorate ).

Please resolve these comments along with any other Last Call commentsyou may receive. Please wait for direction from your document shepherdor AD before posting a new version of the draft.



Document: draft-farrell-decade-ni-07
Title: Naming Things with Hashes
Reviewer: Martin Dürst
Review Date: 2012-06-03, 2012 (written up 2012-06-04/05)
IETF Last Call Date: started 2012-06-04, ends 2012-07-02

Summary: This draft addresses a real generic need, but the current formof the draft is the result of adding more and more special cases withouta clear overall view and a firm hand to separate the wheat from thechaff. This shows both in the technical issues as well as in many of theeditorial issues below. This draft is not ready for publication withoutsome serious additional work, but that work is mostly straightforwardand should be easy to complete quickly.




Major design issue:

The draft defines two schemes, which differ only slightly, and mostlyjust gratuitously (see also editorial issues).These are the ni: and the nih: scheme. As far as I understand, theydiffer as follows:

                                    ni:                nih:
authority:                          optional           disallowed
ascii-compatible encoding:          base64url          base16
check digit:                        disallowed         optional
query part:                         optional           disallowed
decimal presentation of algorithm:  disallowed         possible

The usability of URIs is strongly influenced by the number of differentschemes, with the smaller a number, the better. As a somewhat made-upexample, if the original URIs had been separated into httph: for HTMLpages and httpi: for images, or any other arbitrary subdivision that onecan envision, that would have hurt the growth and extensibility of theWeb. Creating new URI schemes is occasionally necessary, and the ideasthat lead to this draft definitely seem to warrant a new scheme (*), butthere's no reason for two schemes.[(*) I know people who would claim the the .well-formed http/https thingis completely sufficient, no new scheme needed at all.]

More specifically, if the original URIs had been separated into httpm:(for machines) and httph: (for humans), the Web for sure wouldn't havegrown at the speed it did (and does) grow. In practice, there are hugedifferences in human 'speakability' for URIs (and IRIs, for thatmatter); compare e.g. http://google.com withhttp://www.google.co.jp/#sclient=psy-ab&hl=en&site=&source=hp&q=hash&oq=hash&aq=f&aqi=g4&aql=(which I have significantly shortened to hopefully eliminate potentialprivacy issues), or compare the average mailto: URI with the averagedata: URI. However, what's important is that there never has been astrong dividing line between machine-only and human-only URIs orschemes, the division has always been very gradual. Short and mainlyhuman-oriented URIs have of course been handled by machines, and on theother hand, very long URIs have been spoken when really necessary."Speakability" has been maintained to some extent by scheme designers,and to some extent by "survival of the fittest" (URIs that weren't veryspeakable (or spellable/memorizable/guessable/...), and their Web sites,might just die out slowly).

It should also be noted that the resistance against multiple URI schemesmay have been low because there are so many different ways to expresshashes in the draft anyway, and one more (the nih: section is the lastone before the examples section) didn't seem like much of a dealanymore. But when it comes to URIs, one less is a lot better than one more.

In the above ni:/nih: distinction, nih: seems to have been added as anafterthought after realizing that reading an ni: URI aloud over thephone may be somewhat suboptimal because there is a need for repeated"upper case" - "lower case" (sure very quickly shortened to "upper" -"lower" and then to "up" - "low" or something similar). It is not a badidea to try to make sure that IETF technology, and URIs in particular,are accessible to people with certain kinds of dislexya. (There areindeed people who have tremendous difficulties with distinguishingupper- and lower-case letters, and this may or may not be connected withother aspects of dislexya.) It is however totally unclear to thisreviewer why this has to lead to two different URI schemes with othergratuitous differences.

Finding a solution is rather easy (of course, other solutions may alsobe possible): Merge the schemes, so that authority, check digit, andquery part are all optional (an authority part and/or a query part mayvery well be very useful in human communication, and a check digit won'thurt when transmitted electronically) and the decimal presentation ofthe algorithm is always allowed, and use base32(http://tools.ietf.org/html/rfc4648) as the encoding. This leads to a16.6% less efficient encoding of the value part of the ni: URI, butgiven that other URI-related encodings, e.g. the %-encoding resultingwhen converting an IRI to an URI, are much less efficient, and that URIinfrastructure these days can handle URIs with more than 1000 bytes,this should not be a serious problem. Also, there's a separate binaryformat (section 6) that is more compact already.




(relatively) Minor technical issues:

Section 2, "When the input to the hash algorithm is a public key value":Is it absolutely clear that this will work for any and all public keyvalues, existing and future, and not only for what's currently around?After all, as far as I understand, the concept of a public key is afairly general one.

"Other than in the above special case where public keys are used, we donot specify the hash function input here. Other specifications areexpected to define this.": Do you really expect that to happen? Wouldn'tit be better limit variability here as much as possible, and to usemedia types to identify different kinds of data? This would also workfor public keys: If there's a MIME media type for aSubjectPublicKeyInfo, then the fact that this media type is thepreferred way to transfer a public key becomes an application conventionrather than a special case in the spec. If a better way (or just anotherway) to encode/transfer public keys became popular at a later date,there would be no need to change the spec.


Related, in Section 3:
   The "val" field MUST contain the output of base64url encoding the
   result of applying the hash function ("alg") to its defined input,
   which defaults to the object bytes that are expected to be returned
   when the URI is dereferenced.

How do I know whether the default applies or not? The URI doesn't tellyou. Deducing from context is a bad idea.

Section 3: "Thus to ensure interoperability, implementations SHOULD NOTgenerate URIs that employ URI character escaping": This is wrong andneeds to be fixed. Characters such as "&", "=", "#", and "%", as well asASCII characters not allowed in URIs and non-ASCII characters MUST be%-encoded if they appear in query parameter values in URIs (or in queryparameter tags, which is however less likely). It would be better if thespec here deferred to the URI spec rather than trying to come up withits own rules.

Section 3: "The Named Information URI adapts the URI definition from theURI Generic Syntax [RFC3986].": This sounds as if this were a voluntarydecision (and the text should be changed to avoid such an impression),but if you don't conform to RFC 3986 syntax, you're not an URI. This isthe first time I have seen an URI scheme definition starting explicitlywith the top ABNF rule from RFC 3986(http://tools.ietf.org/html/rfc3986#appendix-A). This is completelyunnecessary. Just make sure your production conforms to the generic URIsyntax, and mention all the ABNF rules from RFC3986 that you use.

Also, using the "URI" production from RFC 3986, and then silentlydropping the #fragment part, is technically wrong. Scheme definitionshave nothing to do with the fragment (including the question of whetherthere's a fragment or not; the semantics of fragments are defined by theMIME media type that you get when you resolve). This may not becompletely clear in RFC 4395, but the IRI WG is working on an update ofRFC 4395 where this will be made clearer (see alsohttp://trac.tools.ietf.org/wg/iri/trac/ticket/126; thanks for giving mea chance to remember that I had to create a new issue in the tracker forthis :-).


Section 3, ABNF:
            ni-hier-part   = "//" authority path-algval
                             / path-algval

This gives you ni://example.com/sha-256;f4OxZX_x_FO5... (//authority/)and ni:/sha-256;f4OxZX_x_FO5... (one slash only), but the examples showni:///sha-256;f4OxZX_x_FO5... (three slashes). It looks like the ABNFyou want is:

            ni-hier-part   = "//" authority path-algval
                           / "//" path-algval
(aligning "=" and "/" helps!)
or more simply:
            ni-hier-part   = "//" [authority] path-algval
or even more simply:
            ni-hier-part   = "//" authority path-algval
because authority can be empty; let's show this:
   authority     = [ userinfo "@" ] host [ ":" port ]
If we can show that host can be empty, we're done:
   host          = IP-literal / IPv4address / reg-name

If we can show that any one of these can be empty, we're done, let'spick reg-name:

   reg-name      = *( unreserved / pct-encoded / sub-delims )
* means "zero or more", thus reg-name can be empty. QED.

Section 4:
   The HTTP(S) mapping MAY be used in any context where clients without
   support for ni URIs are needed without loss of interoperability or
   functionality.

What is meant by "support for ni"? There's nowhere in the spec wherethis is explained clearly. If I were a browser maker, or writing an URIlibrary,..., what would I do to support the ni scheme? The only thing Ihave come up with is to covert ni to the .well-known format, then useHTTP(S). In that case, the above text seems wrong, as it says that.well-known is used when there's no support for ni, not in order tosupport ni.

Section 5: This defines an "URL segment format". It seems to be limitedto path componest in HTTP URIs. What if I want to use this in a querypart, or maybe even as a fragment identifier? What if I want to use thisas a path component in an FTP URI? Or in some other schem? It would bebetter to define the alg-val (see next point) part as such (before theother things), with an explanation along the following lines: "This isdefined here both for use in other sections of this document as well asfor use in other places where it may be helpful, such as HTTP URI pathsegments,..."

Section 5 (and Section 3): "To do this one simply uses the "alg;val"production": There is no "alg;val" production. Please change to "To dothis one simply uses the <alg-val> production" and fix the ABNF insection 3 to

            path-algval = "/" alg-val
            alg-val     = alg ";" val

It's probably even better to fold this in with the changes toni-hier-part, resulting e.g. in:

            ni-hier-part   = "//" authority "/" alg-val
            alg-val     = alg ";" val

Section 9.4: Status can be 'empty' or 'deprecated'. I suggest to replace'empty' with something positive, such as 'valid' or 'active'. This willhelp people who go to the IANA page and start to ask "well, it doesn'thave a status, what does that mean". Also, I strongly suggest to add anadditional status 'reserved', and remove the current "Reserved" hashname string from the entries with IDs 0 and 32.

Section 9.4: "The Suite ID value 32 is reserved for compatibility withORCHIDs [RFC4843].": How will compatibility be kept for futurechanges/additions in ORCHID?




Major editorial issues:

Title and abstract (and the spec itself) use the wording "NamingThings". While in a security context, it may be that there is animplicitly assumption that there are only digital things, in a widercontext, this is of course not true. Research on the Internet of Thingsand efforts such as the Semantic Web/Linked Data try to deal with thingsin the real world. People in these areas it will be confused by title,abstract, and text, unless you can show (me and) them an ni: hash for aperson, an apple, a building, or an elephant. Therefore, while it may bepossible to keep the catchy title, the abstract has to be fixed to avoidsuch misunderstandings, e.g. by changing "to identify a thing" to "toidentify a digital object" or some such in the abstract, and likewise inthe main text of the spec.

"Human-speakable" (e.g. ), "human-readable" (e.g. section title ofsection 7), and "for humans" (e.g. section title of section 9.2): Theseterms are used throughout the spec, but are imprecise and confusing.First, there's the problem of interpreting "for humans" in the sense ofthe previous paragraph, which of course has to be fixed. But the mainproblem is that none of the "ni:" URIs are "non-human-readable" or"non-human-speakable". Reading them aloud is only somewhat more tedious,but not at all impossible. And because the value part of the nih: formis 50% longer, and people quickly develop conventions for shorteningthings such as "upper case" and "lower case", it's not even clear thatreading aloud the nih: form will necessarily take that much time.Therefore, I strongly recommend to change all occurrences of"Human-speakable", "human-readable", "for humans", and the like, to themore precise "more easily read out aloud by humans" or something equivalent.

Abstract and further on: "specifying URI, URL": By all URx theories (seee.g. http://www.w3.org/TR/uri-clarification/), URLs are a subset ofURIs, and therefore saying that the spec specifies an URI and an URL issomewhat confusing. I'd propose using wording along the following lines:"specifying an URI scheme and a way to map these URIs to http".

Section 2, "When the input to the hash algorithm is a public key value",and example section: It took me a while to understand that the "publickey" stuff was not yet another way to present a hash, and also not a wayto mix in a public key to the hash in order to obtain some specificsecurity property (I wasn't able to figure out how that would work, butdraft-hallambaker-decade-ni-params contains something similar involvingdigital signatures and a public key). The document would be much easierto understand if there was a section e.g. entitled "Forms of input tohash", with subsections e.g. "general data", "public keys", "other stuff(not defined in this document)". As it is written, the relevantparagraphs in section 2 look like an afterthought, and it's not clear towhat.Also, the example section should be fixed as follows: 1) say upfrontthat there will be two examples, one for a short string and another fora public key. 2) Make sure both examples exercise all forms (the publickey example seems to be pretty complete, but the "Hello World!" exampleseems to be incomplete). 3) Use the same form of presentation (either atable in both cases or short paragaphs in both cases.

The caption on Figure 7 is also way too unspecific.

Section 9.4: "Hash Name Algorithm Registry", and later "a new registryfor hash algorithms as used in the name formats specified here": IANAwill be helped tremendously if your draft comes with aneasy-to-understand and unambiguous name for the new registry. "Hash NameAlgorithm Registry" may be okay, but is probably not specific enough.The circumscription at the start of the section is definitely not goodenough because you're not registering hash algorithms, but names of hashalgorithms and their truncations.




Minor editorial issues:

Introduction: It would be good to have a general reference to hashing(for security purposes) for people not utterly familiar with the subject.

Intro: After reading the whole document, the structure of the Introseems to make some sense, but it didn't on first reading (where it'sactually more important). The main problem I was able to identify wasthat after a general outlook in paragraph 1, the Intro drops into a listof examples without saying what they are good for. I suggest to, afterthe sentence "This document specifies standard ways to do that to aidinteroperability.", add a sentence along the lines: "The next fewparagraphs give usage examples for the various ways to include a hash ina name or identifier as they are defined later in this document.". Itmay also make sense to further streamline the following paragraphs, sothat it is clearer which pieces of text refer each to one of the"standard ways".

There are two instances of the term "binary presentation". Lookingaround, it seems that they are supposed to mean the same as "binaryformat". Please replace all instances of "binary presentation" with"binary format" to avoid misunderstandings and useless seach time.

Section 3: "A Named Information (ni) URI consists of the followingcomponents:": It would be good to know exactly where the list ended. Oneway to do this would be to say "consists of the following nine components".

Section 3: "Note that while the ni names with and without an authoritydiffer syntactically, both names refer to the same object if the digestalgorithm and value are the same.": What about cases with differentauthority? The text seems to apply by transitivity, but this may be easyto miss for an implementer. I suggest changing to: "Note that while ninames with and without an authority, and ni names with differentauthorities, differ syntactically, they all refer to the same object ifthe digest algorithm and value are the same.".

Section 3: "Consequently no special escaping mechanism is required forthe query parameter portion of ni URIs.": Does this mean "no escapingmechanism at all"? Or "nothing besides %-encoding"? Or something else?Please clarify.

Figure 3: the "=" characters of the various rules should be aligned asmuch as possible to make it easier to scan the productions (seehttp://tools.ietf.org/html/rfc3986#appendix-A for an example).


Section 3:
            unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
                ;  directly from RFC 3986, section 2.3
                ; "authority" and "pct-encoded" are also from RFC 3986

Please don't copy productions. Please don't copy half (or one-third,actually) of the productions you use, and reference the rest. Pleasedon't say what productions you copy from where in a comment, and evenless in a comment for an unrelated production. Please before the ABNF,say which productions are used from another spec.


Section 4:
   The HTTP(S) mapping MAY be used in any context where clients without
   support for ni URIs are needed without loss of interoperability or
   functionality.

This is difficult to understand. If some new functionality is proposed,it's usually a client *with* the new functionality that's needed, notone without. Also, the "without loss of interoperability orfunctionality" is unclear: Sure if ni isn't supported, there's a loss ininteroperability. So I suggest to rewrite this as:

   The HTTP(S) mapping MAY be used in any context where clients with
   support for ni URIs are not available.
(but see also the comment in minor technical issues)

Section 6: "binary format name": Why 'name'? Why not just "binaryformat"? The later is completely clear in the context of the document ortogether with an indication of the document; for something that can beused independently, even "binary format name" isn't enough.

Section 6: "suite ID": The word "suite" seems out of place here. In thegeneral use of the term, it refers to "a group of things forming a unitor constituting a collection" (seehttp://www.merriam-webster.com/dictionary/suite). A good definition thatworks for the uses I'm familiar with in digital security would be "Analgorithm suite is a coherent collection of cryptographic algorithms forperforming operations such as signing, encryption, generating messagedigests, and so on."(http://fusesource.com/docs/framework/2.4/security/MsgProtect-SOAP-SpecifyAlgorithmSuite.html;disclaimer: I'm in no way a SOAP fan). The use here is not for acollection, but for a single truncated-length variant of a single hashalgorithm. I seriously hope you can find a better name.

Section 6: "Note that a hash value that is truncated to 120 bits willresult in the overall name being a 128-bit value which may be usefulwith certain use-cases.": This left me really wondering: Is theresomething magic to 128 bits in computer/internet security? What are the"certain use cases"? Or is this just an example to make sure the readergot the relationships, and it could have been as well "Note that a hashvalue that is truncated to 64 bits will result in the overall name beinga 72-bit value which may be useful with certain use-cases." (or whateverother value that's registered in section 9)?

Section 7: Just for the highly unfortunate case that this doesn'tdisappear, it would be very helpful if the presentation of this sectionparalleled section 3.

Section 7: "contain the ID value as a UTF-8 encoded decimal number": I'man internationalization expert with a strong affection for UTF-8, buteven for me, this should be "contain the ID value as an ASCII encodeddecimal number".

Section 9: The registration templates refer to sections. This is finefor readers of the draft, but not if the template is standalone. Isuggest using a format such as that athttp://tools.ietf.org/html/rfc6068#section-8.1, which in draft stage maylook e.g. likehttp://tools.ietf.org/html/draft-duerst-eai-mailto-03#section-8.1.

Section 9.3: "Assignment of Well Known URI prefix ni" and later (andelsewhere in the draft) "URI suffix": Are we dealing with a prefix or asuffix here?


Section 9.4: "This registry has five fields, the binary suite ID,...":
Better to remove the word "binary", because the actual number is decimal.

Section 9.4: "The expert SHOULD seek IETF review before approving arequest to mark an entry as "deprecated." Such requests may simply takethe form of a mail to the designated expert (an RFC is not required).IETF review can be achieved if the designated expert sends a mail to theIETF discussion list. At least two weeks for comments MUST be allowedthereafter before the request is approved and actioned.": I'm at a lossto see why asking the IETF at large is a SHOULD, but if it's done, thenthe two weeks period is a MUST.

Section 9.4: The registry initialization in Fig. 8 refers to RFC4055many times. But RFC 4055 does in no way define SHA-256. It looks likethe actual spec is http://tools.ietf.org/html/rfc4055#ref-SHA2 (NationalInstitute of Standards and Technology (NIST), FIPS 180-2: Secure HashStandard, 1 August 2002.) I think this should be cited, in particularbecause there is a "Specification Required" requirement, and this sureshould mean that there is a Specification for the actual algorithm, andnot just a specification that mentions some labels. So using RFC4055 asa reference could be taken as creating bad precedent.

Section 9.4: "The designated expert is responsible for ensuring that thedocument referenced for the hash algorithm is such that it would beacceptable were the "specification required" rule applied.": Why allthis circumscription? Why not just say something like: "The designatedexpert is responsible for ensuring that the document referenced for thehash algorithm meets the "specification required" rule."




Nits:

Author's list: Last time I heard about this, there was a general limitof 5 authors per RFC. I'm not sure whether this still exists, and what'dbe needed to get around it, but I just wanted to point out that this maybe a potential problem or additional work (hoops to get through).


Intro: "Since, there is no standard" -> "Since there is no standard"

Intro: "for these various purposes" -> "for these purposes" or "forvarious purposes" (the indefinite 'various' is incompatible with thedefinite 'these').

"2. Hashes are what Count" -> "2. Hashes are what Counts" (the formermay look logically correct, but 'what' requires a singular verb form.

Section 2: "the left-most or most significant in network byte order Nbits from the binary representation of the hash value" -> "the left-most(or most significant in network byte order) N bits from the binaryrepresentation of the hash value" or "the left-most N bits, or the Nmost significant bits in network byte order, from the binaryrepresentation of the hash value" (the current text is virtuallyunparsable).

Figure 1: The 0x notation is never explained. A short clause or pharseis all that would be needed, but it would be better if this were spelledout.

Section 3, Query Parameter separator: "The query parameter separatoracts a separator between" -> "The query parameter separator acts *as* aseparator between".

Section 3, Query Parameters: "A tag=value list of optional queryparameters as are used with HTTP URLs" -> "A tag=value list of optionalquery parameters as used with HTTP URLs" (or "A tag=value list ofoptional query parameters as they are used with HTTP URLs").

Section 4: "the object named by the ni URI will be available at thecorresponding HTTP(S) URL" -> "the object named by the ni URI will beavailable via the corresponding HTTP(S) URL" (via stresses the pointthat this should be done via (sic) redirection)

Section 4: "so there may still be reasons to use" -> "so there can stillbe reasons to use" (better to use can because non-normative; thedocument otherwise does a good job on this)

Section 10: "Note that fact that" -> "Note the fact that", or muchbetter: "Note that".



Regards,     Martin.