Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-11

And the -11 version resolves everything else, plus picks up improved
versions of the editorial clarifications; -11 is Ready for RFC publication.

Many thanks for the timely responses.

Thanks,
--David

-----Original Message-----
From: Black, David
Sent: Wednesday, December 10, 2014 10:51 AM
To: nico(_at_)cryptonector(_dot_)com; General Area Review Team 
(gen-art(_at_)ietf(_dot_)org); ops-
dir(_at_)ietf(_dot_)org
Cc: ietf(_at_)ietf(_dot_)org; json(_at_)ietf(_dot_)org; Black, David
Subject: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10

The -10 version of this draft resolves items [A]-[E] from the
Gen-ART and OPS-Dir review of -09, and the IESG is in the process of
resolving the (silly) idnits complaint about RFC 20 being a possible
downref.

For item [D], a different approach was taken instead of modifying
the ABNF - the resulting new Section 2.4 is a definite improvement
to the draft, and is significantly clearer than the modified ABNF
would have been.  Nicely done.

Item [F] about the <angle-bracketed> text in the IANA Considerations
(Section 4) remains open - if the intent is to not deal with replacing
that text until after IESG approval, an RFC Editor Note to that effect
should be added to Section 4.

I have an additional editorial concern - given all the discussion about
UTF-8, it would be good for the draft to make it clear early on
that JSON text sequences are UTF-8 only.  Here are some suggested changes.

Abstract:

   This document describes the JSON text sequence format and associated
   media type, "application/json-seq".  A JSON text sequence consists of
   any number of JSON texts, each prefix by an Record Separator
   (U+001E), and each ending with a newline character (U+000A).

"any number of JSON texts" -> "any number of UTF-8 encoded JSON texts"

It also looks like ASCII names for RS and LF are being mixed w/Unicode
codepoints in the second sentence in the abstract.  I'm not sure that's
a good thing to do, especially as the body of the draft refers to RS and
LF as being ASCII.  Here are a couple of changes that would remedy this:

   "an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)"
   "a newline character (U+000A)" -> "an ASCII newline character (0x0A)"

Section 2 JSON Text Sequence Format:

I suggest adding this sentence as a separate paragraph at the end of this
section (i.e., just before Section 2.1):

   JSON text sequences MUST use UTF-8 encoding; other encodings of JSON
   (i.e., UTF-16 and UTF-32) MUST NOT be used.

Aside from item [F], all of the above are editorial suggestions, but I
think making this clear early in the draft will help avoid potential
implementer confusion.

Thanks,
--David

-----Original Message-----
From: Black, David
Sent: Friday, December 05, 2014 9:51 AM
To: nico(_at_)cryptonector(_dot_)com; General Area Review Team 
(gen-art(_at_)ietf(_dot_)org); ops-
dir(_at_)ietf(_dot_)org
Cc: ietf(_at_)ietf(_dot_)org; json(_at_)ietf(_dot_)org; Black, David
Subject: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows
...

I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at:

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

I have reviewed this document as part of the Operational directorate's

ongoing

effort to review all IETF documents being processed by the IESG.  These
comments
were written primarily for the benefit of the operational area directors.
Document editors and WG chairs should treat these comments just like any

other

last call comments.

Document: draft-ietf-json-text-sequence-09
Reviewer: David Black
Review Date: Dec 5, 2014
IETF LC End Date: Dec 5, 2014
IESG Telechat date: Dec 18, 2014

Summary: This draft is on the right track, but has open issues
            described in the review.

This draft specifies a format that packs multiple JSON texts into a
single string.  The ASCII RS (0x1E) character is used to separate texts,
and a linefeed is appended to each text to ensure that a complete text
always ends with a whitespace character.

All of the open issues are minor - the most important ones center on
treatment of incomplete JSON texts - that appears to be an afterthought
in this draft and needs more attention.  I also found a couple of
minor issues in the Security and IANA Considerations sections, both of
which are almost nits.

Major issues: None.

Minor issues:

[A] Section 2.1:

   If parsing of such an octet string as a JSON text fails, the parser
   SHOULD nonetheless continue parsing the remainder of the sequence.

That's not quite right - there are two levels of parsing, JSON
sequence parsing and JSON text parsing of each text in the sequence,
both of which might be implemented in a single-pass parser.  For such an
implementation, the above sentence could be (mis-)read to imply that the
JSON text parse should resume from the point at which it failed, which
would be silly (although I've seen heroic PL/1 parsers do exactly that).
Instead, the parse needs to skip ahead to the next RS, ignoring the rest
of the JSON text that failed to parse.  I suggest:

   If parsing of such an octet string as a JSON text fails, and the
   octet string is followed by an RS octet, the parser
   SHOULD nonetheless skip ahead to that RS octet and continue parsing
   the remainder of the sequence from there.

That also covers the case where there is nothing more to parse after the
JSON text that caused the parse failure.

[B] Section 2.3:

Is incremental parsing of a JSON text within a sequence allowed, or
is the parser required to not produce any results until the parse of
the entire text is successful?  I'd expect that incremental parsing
is ok (so results may be produced from a text that ultimately fails
to parse), and I think that's worth stating.

[C] Section 2.4:

   Parsers MUST check that any JSON texts that are a top-level number
   include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
   number, otherwise the JSON-text may have been truncated.

That reference to the "ws" rule doesn't get the job done because that
rule allows a match to no characters - it's of the form ws = *( ... )
where ... is the list of whitespace characters.  What's needed here is
a rule of the form vws = 1*( ...) to force there to be at least one
whitespace character, but see the next issue for a better way to deal
with this topic by pulling the appended LF into the sequence parse
instead of the text parse.

[D] I wonder whether the possibility of incomplete texts ought to be
encoded into the parsing rules to directly catch JSON texts that must
be incomplete because the last character is not LF, e.g.:

     JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
     RS = %x1E; "record separator" (RS), see RFC20
     possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
                               ; JSON text (see RFC7159)
     truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
                                    ; to parse as JSON text
     empty-JSON = LF ; only the LF appended by the encoder, nothing to parse

     not-RS = %x00-1D / %x1F-FF; any octet other than RS
     not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF

Note that this won't detect all incomplete JSON texts, because LF is allowed
within a JSON text (and this should be stated).

[E] Section 3 - Security Considerations

Incomplete and malformed JSON texts can be used to attack JSON parsers -
that should be pointed out, as I don't see that in RFC 7159's security
considerations and incomplete texts are a relevant consideration for
this draft.

[F] Section 4 - IANA Considerations

   Security considerations: See <this document, once published>,
   Section 3.

   Interoperability considerations: Described herein.

   Published specification: <this document, once published>.

   Applications that use this media type: <by publication time
   <https://stedolan.github.io/jq> is likely to support this format>.

Replace all three instances of the angle bracketed text.  The first two
instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
Editor to insert the number of the RFC when published.  The third instance
should be resolved now, or could have an RFC Editor note added indicating
that the author will resolve that during Authors 48 hours.

Nits/editorial comments:

idnits didn't like the reference to RFC 20 for ASCII:

  ** Downref: Normative reference to an Unknown state RFC: RFC   20

RFC 5234 (ABNF) uses this, which looks like a better reference:

   [US-ASCII]  American National Standards Institute, "Coded Character
               Set -- 7-bit American Standard Code for Information
               Interchange", ANSI X3.4, 1986.

--- Selected RFC 5706 Appendix A Q&A for OPS-Dir review ---

Most of these questions are n/a because this draft describes a format
that will be used in other protocols to which RFC 5706's concerns would

apply.


A.1.4   Have the Requirements on other protocols and functional
       components been discussed?

The specification of the interaction of the JSON sequence parser with the
JSON text parser is not as clear as it should be for incomplete or malformed
JSON texts.  See Minor Issues [A]-[E] above.

A.1.8   Are there fault or threshold conditions that should be reported?

Yes, incomplete JSON texts - this is covered in sections 2.3 and 2.4.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david(_dot_)black(_at_)emc(_dot_)com        Mobile: +1 (978) 394-7754
----------------------------------------------------