ietf
[Top] [All Lists]

Last Call: <draft-ietf-appsawg-uri-scheme-reg-04.txt> (Guidelines and Registration Procedures for URI Schemes) to Best Current Practice

2015-03-12 05:06:40
Here are some last call comments on draft-ietf-appsawg-uri-scheme-reg-04. The review was started a while ago, and completed, but the writeup took a lot of time and is still not completed, sorry. I may be able to complete it tomorrow, but please don't hold your breath.

[Just in case this is necessary, as a process point, I have seen various tracker messages (such as the draft being placed on telechat, or that the last call has ended), but I'd like to note that the Last Call mentions an end date of 2015-03-12, and it's still 2015-03-12 here in Japan, which means that this date has barely started in some other locations around the world.]



My overall impression is that the overall direction of the draft is just fine, but that presentation and wording are quite rough in many places and would tremendously benefit from more careful wording.


Introduction: Overall, this felt too long, and it would benefit from better structuring, and/or moving some of the points out to their own sections/subsections. For example, adding subsection titles such as "URIs and IRIs" and "Generic Syntax and Scheme Specific Syntax" or some such would help quite a lot.


"  o  provide a central point of discovery for established URI scheme
      names, and easy location of defining documents for standard
      schemes;"
The use of the word "standard" in "standard scheme" is unclear. This use doesn't appear anywhere in the document. Do you mean permanently registered schemes? If there's some specific point to be made, please make it more clearly. Otherwise, I suggest to just drop the word "standard".


"o  discourage multiple separate uses of the same scheme name;"
I'd personally be happier if this said "strongly discourage", because I hope we all agree that it's really not a good idea. If the consensus is that it's obvious anyway that it's really not a good idea, and we don't need to be overly clear about that, then I'll keep quiet.


"o  encourage registration by setting a low barrier for registration."
What about making this "encourage early registration"?


"A URI scheme name is the same as the corresponding IRI scheme name."
At the minimum, I'd turn this around and say "An IRI scheme name is the same as the corresponding URI scheme name." But because the there isn't really anything like an "IRI scheme name", I'd actually prefer if this said "IRIs use the same scheme names as URIs." or something similar.



"For example, this means that fragment identifiers (#) cannot be re-used outside the generic syntax restrictions." My 'best-guess' interpretation of this sentence is that this intended to say that a scheme definition cannot define fragments that contain characters (e.g. #) that RFC 3986 doesn't allow.

But this is bad advice, because scheme definitions cannot say anything about fragments at all. This isn't syntax, but semantics; the semantics of a fragment are defined by the media type, not the scheme. I haven't found any place anywhere in this doc that says this, it clearly should be added.

If you want to make an example re. syntax, I'd suggest to say something like "For example, the query part cannot contain literal '#' characters because they and anything after them would be interpreted as part of the fragment and not the query." or some such.

Also, the "(#)" in the text is completely superfluous; the '#' itself isn't the fragment, and a reader should be able to correlate the word in the text and the same word in the ABNF.


"A scheme definition must specify the scheme name and the syntax of the scheme-specific part, which is clarified as follows:" Saying "clarified as follows" and then just giving some ABNF may be difficult to grok for some people. I propose to change the sentence to "A scheme definition must specify the scheme name and the syntax of the scheme-specific part, which corresponds to the 'hier-part' and the optional query in the above definition. This can be clarified by rewriting the definition as follows:"


2. Terminology:

   Within this document, the key words MUST, MAY, SHOULD, REQUIRED,
   RECOMMENDED, and so forth are used within the general meanings
   established in [RFC2119], within the context that they are
   requirements on future registrations.
The double 'within' is confusing. I propose to replace "within the context that they are requirements on future registrations" with "as requirements on future registrations"


3.  Requirements for Permanent Scheme Definitions

                                                                     For
   IETF Standards-Track documents, Permanent registration status is
   REQUIRED.
Please change this to: "For URI Scheme definitions in IETF Standards-Track documents, Permanent registration status is REQUIRED."


3.2. Syntactic Compatibility

                                           Care must be taken to ensure
   that all strings matching their scheme-specific syntax will also
   match the <absolute-URI> grammar described in [RFC3986].

Pronouns like "their" don't usually work well in standard language. I suggest changing "their scheme-specific syntax" to "the syntactic restrictions of the scheme definition" or some such.


                                                   If there is a strong
   reason for a scheme not to use the hierarchical syntax, then the new
   scheme definition SHOULD follow the syntax of previously registered
   schemes.

Please change "the syntax of previously registered schemes" to "the syntax of previously registered schemes with similar components or similar syntactic needs." or some such, to make it clear that it's not sufficient to just copy some syntax if it's totally unrelated.


   Schemes that are not intended for use with relative URIs SHOULD avoid
   use of the forward slash "/" character, which is used for
   hierarchical delimiters, and the complete path segments "." and ".."
   (dot-segments).
It would be good if the text gave the reasons for the SHOULD (which I fully agree with; maybe even a MUST).


Please add a(n informational) reference to Gettys, J., "URI Model Consequences", <http://www.w3.org/DesignIssues/ModelConsequences> in this section. It is a great text helping designers of URI scheme syntax to understand the ideas regarding the different syntax components.


   New schemes SHOULD clearly define the role of [RFC3986] reserved
   characters in URIs of the scheme being defined.
The location of [RFC3986] is a bit strange. It might work if "[RFC3986] reserved characters" is taken as a phrase, but it's difficult for the reader to see that. Also, the specific topic is discussed in Section 2.2 of [RFC3986], so change the above to:
   New schemes SHOULD clearly define the role of reserved characters
   (see [RFC3986], Section 2.2) in URIs of the scheme being defined.


3.3. Well-Defined

                                                        and how legal
   values in the base namespace, or legal protocol interactions, might
   be represented in a valid URI.
"might be represented" -> "are represented"

                                   See Section 3.6 for guidelines for
   encoding binary or character strings within valid character sequences
   in a URI .
"binary or character strings" -> "sequences of bytes or characters" (in most contexts (programming languages,...), "character string" is equivalent to "string", while "binary string" is undefined.)

Superfluous space before period.

               If not all legal values or protocol interactions of the
   base standard can be represented using the scheme, the definition
   SHOULD be clear about which subset are allowed, and why.
"Which subset are" -> "which subset is" (or "which subsets are")


3.5. Context of Use

                  Most commonly, URIs are used as references to
   resources within directories or hypertext documents, as hyperlinks to
   other resources.

This sentence is totally unclear. Why do directories turn up here? Is "resources within directories" and "other resources" parallels? Are "references" and "hyperlinks" intended to be parallels? Is "references to resources within ..." intended to mean "references to resources from within ..." or "references to (resources within ...)"?. Please clarify.


3.6. Internationalization and Character Encoding


   When describing schemes in which (some of) the elements of the URI
   are actually representations of human-readable text, care should be
   taken not to introduce unnecessary variety in the ways in which
   characters are encoded into octets and then into URI characters; see
   [RFC3987] and Section 2.5 of [RFC3986] for guidelines.  If URIs of a
   scheme contain any text fields, the scheme definition MUST describe
   the ways in which characters are encoded and any compatibility issues
   with IRIs of the scheme.

I think it would be extremely helpful to the average URI scheme designer/describer if this section mentioned the use of UTF-8. The reference to Section 2.5 of RFC 3986 is good, but the problem with that section is that it starts out with very general and abstract language, and one has to read through the whole section to find the relevant (and extremely clear and appropriate) advice in the last paragraph.

At a minimum, please point the reader to the last paragraph of Section 2.5. Much better would be to include that paragraph verbatim (and saying so explicitly):


   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".


   The scheme specification SHOULD be as restrictive as possible
   regarding what characters are allowed in the URI, because some
   characters can create several different security considerations (see,
   for example [RFC4690]).
I'm afraid that many people will read "as restrictive as possible" as "well, let's just do ASCII only" or some such. I believe and hope that this wasn't the intent, but I don't think this comes across. One kind of improvement would be to change "as restrictive as possible" to just "restrictive". Another is to change "as restrictive as possible" to "as restrictive as possible without excluding characters outside US-ASCII".

"can create security considerations" sounds weird. The characters may create security issues or security problems or some such, which may need to be described in a security consideration section.


   All percent-encoded variants are automatically included by definition
   for any character given in an IRI production.  This means that if you
   want to restrict the URI percent-encoded forms in some way, you must
   restrict the Unicode forms that would lead to them.

I know what you want to say here (I think it's the point originally brought up by Björn Höhrmann in the IRI WG). But I think it's too restrictive and can be worded better:

   URI schemes that include textual data from Unicode have to be aware
   that they have to define both the actual characters allowed (for
   IRIs) and the corresponding percent-encoded forms (for URIs and
   IRIs). This can be done in various ways, but in most cases, it is
   advisable to define the actual characters allowed in an IRI
   production, to allow the 'pct-encoded' definition from Section 2.1
   of [RFC 3986] at the same places, and to add prose that limits
   percent-escapes to those that can be created by converting valid
   character sequences to percent-encoding via UTF-8.


Regards,    Martin.


<Prev in Thread] Current Thread [Next in Thread>