RE: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

Nico,

Thanks for the comprehensive response.

I've excerpted a few things for further comment - I'm fine with the
proposed actions for anything not mentioned here.

[A] JSON text parse failures

   If parsing of such an octet string as a JSON text fails, and the
   octet string is followed by an RS octet, the parser
   SHOULD nonetheless skip ahead to that RS octet and continue parsing
   the remainder of the sequence from there.


That's a weird way of saying that, wherever the JSON text parse fails,
the sequence parser should then seek forward until the next RS-valued
byte.  But it works for me; I'll take it.


Your alternative wording "whenever the JSON text parse fails, ..." is fine.

[D] Truncation

A missing terminating LF is not a problem for strings, arrays, or
objects.  I seem to recall that we did discuss this.  We could require
that such texts fail to parse, but perhaps the more important thing is
to require common parser behavior as to such truncations.

You ABNF proposal is certainly more strict than the one in the I-D.  I'm
neutral as to whether this form or the one in the I-D (with the ws issue
fixed) is better.  The stricter form is clearly easier to talk about,
therefore preferable, but it will mean discarding texts where only that
terminating LF is truncated.


I concur with both of the above paragraphs - my preference is to detect
incomplete JSON texts at the sequence level via the missing LF rather than
special-casing numbers and relying on failed JSON parses for everything else.
In general, earlier detection of errors increases the options for dealing
with them.

Once the incomplete text is detected, a JSON parse could be attempted,
with the JSON parser knowing that the text is incomplete (e.g., text
may fail to parse, a number at the end of the text must not be produced
as an incremental parse result).

I'll defer to the WG on whether/how to expand the decoder ABNF to better
detect incomplete texts and how to deal with any incomplete texts that are
detected.

As for RFC 20 ...

Nits/editorial comments:

idnits didn't like the reference to RFC 20 for ASCII:

  ** Downref: Normative reference to an Unknown state RFC: RFC   20


Is this resolved by now?  I can always reference only Unicode.


Keep the RFC 20 reference - I have no problem with it.  Moreover, as a
result of all the hubbub around this nit, the IESG has issued a Last Call
to reclassify RFC 20 as an Internet Standard ... so that this never
arises again ...

http://www.ietf.org/mail-archive/web/ietf-announce/current/msg13546.html

Thanks,
--David

-----Original Message-----
From: Nico Williams [mailto:nico(_at_)cryptonector(_dot_)com]
Sent: Monday, December 08, 2014 11:20 PM
To: Black, David
Cc: General Area Review Team (gen-art(_at_)ietf(_dot_)org); 
ops-dir(_at_)ietf(_dot_)org;
ietf(_at_)ietf(_dot_)org; json(_at_)ietf(_dot_)org
Subject: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

On Fri, Dec 05, 2014 at 02:51:04PM +0000, Black, David wrote:

This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows

...

Thanks you.

Minor issues:

[A] Section 2.1:

   If parsing of such an octet string as a JSON text fails, the parser
   SHOULD nonetheless continue parsing the remainder of the sequence.

That's not quite right - there are two levels of parsing, JSON
sequence parsing and JSON text parsing of each text in the sequence,
both of which might be implemented in a single-pass parser.  For such an
implementation, the above sentence could be (mis-)read to imply that the
JSON text parse should resume from the point at which it failed, which
would be silly (although I've seen heroic PL/1 parsers do exactly that).
Instead, the parse needs to skip ahead to the next RS, ignoring the rest
of the JSON text that failed to parse.  I suggest:


Good point.

   If parsing of such an octet string as a JSON text fails, and the
   octet string is followed by an RS octet, the parser
   SHOULD nonetheless skip ahead to that RS octet and continue parsing
   the remainder of the sequence from there.


That's a weird way of saying that, wherever the JSON text parse fails,
the sequence parser should then seek forward until the next RS-valued
byte.  But it works for me; I'll take it.

[B] Section 2.3:

Is incremental parsing of a JSON text within a sequence allowed, or
is the parser required to not produce any results until the parse of
the entire text is successful?  I'd expect that incremental parsing
is ok (so results may be produced from a text that ultimately fails
to parse), and I think that's worth stating.


It's permitted, of course, with the caveat that all incremental parsers
have: the parse can ultimately fail.  I'll add this note:

   Incremental JSON text parsers may be used, though of course failure
   to parse a given text may result after first producing some
   incremental parse results.

to section 2.3.

[C] Section 2.4:

   Parsers MUST check that any JSON texts that are a top-level number
   include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
   number, otherwise the JSON-text may have been truncated.

That reference to the "ws" rule doesn't get the job done because that
rule allows a match to no characters - it's of the form ws = *( ... )
where ... is the list of whitespace characters.  What's needed here is
a rule of the form vws = 1*( ...) to force there to be at least one
whitespace character, but see the next issue for a better way to deal
with this topic by pulling the appended LF into the sequence parse
instead of the text parse.


I'd wanted to not have to list the characters that are considered
whitespace.  I agree that the "ws" rule is not appropriate though.

[D] I wonder whether the possibility of incomplete texts ought to be
encoded into the parsing rules to directly catch JSON texts that must
be incomplete because the last character is not LF, e.g.:


A missing terminating LF is not a problem for strings, arrays, or
objects.  I seem to recall that we did discuss this.  We could require
that such texts fail to parse, but perhaps the more important thing is
to require common parser behavior as to such truncations.

You ABNF proposal is certainly more strict than the one in the I-D.  I'm
neutral as to whether this form or the one in the I-D (with the ws issue
fixed) is better.  The stricter form is clearly easier to talk about,
therefore preferable, but it will mean discarding texts where only that
terminating LF is truncated.

One problem we get into with your ABNF is that RFC5234 does not discuss
greediness (this came up on the list).  But this can be noted (see
below).

One nice thing about the stricter rules is that we need not discuss
top-level numbers (or booleans, or null) with normative text, just note
them.

     JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
     RS = %x1E; "record separator" (RS), see RFC20
     possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
                               ; JSON text (see RFC7159)
     truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
                                    ; to parse as JSON text
     empty-JSON = LF ; only the LF appended by the encoder, nothing to parse

     not-RS = %x00-1D / %x1F-FF; any octet other than RS
     not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF

Note that this won't detect all incomplete JSON texts, because LF is allowed
within a JSON text (and this should be stated).


It will if ABNF matching is greedy and possible-JSON is defined as

   possible-JSON = 1*( 1*(not-RS) LF); ...

Advice as to which form to take?

[E] Section 3 - Security Considerations

Incomplete and malformed JSON texts can be used to attack JSON parsers -
that should be pointed out, as I don't see that in RFC 7159's security
considerations and incomplete texts are a relevant consideration for
this draft.


And surely also for RFC7159.  Besides requiring that they fail to parse
(and the note about incremental parsing), are there any other missing
considerations?  Ah yes, smuggling of data in sequences where parsers
will ignore without warning any octet strings that fail to parse as JSON
texts.

Proposed text:

   As usual, parsers must operate on as-good-as untrusted input. This
   means that parsers must fail gracefully in the face of malicious
   inputs. Note that incremental parsers can produce partial results and
   later indicate failure to parse the remainder of a text. Note that
   texts that fail to parse and are ignored can be used to smuggle data
   past sequence parsers that don't warn about JSON text failures.

[F] Section 4 - IANA Considerations

   Security considerations: See <this document, once published>,
   Section 3.

   Interoperability considerations: Described herein.

   Published specification: <this document, once published>.

   Applications that use this media type: <by publication time
   <https://stedolan.github.io/jq> is likely to support this format>.

Replace all three instances of the angle bracketed text.  The first two
instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
Editor to insert the number of the RFC when published.  The third instance
should be resolved now, or could have an RFC Editor note added indicating
that the author will resolve that during Authors 48 hours.


Sure.

Nits/editorial comments:

idnits didn't like the reference to RFC 20 for ASCII:

  ** Downref: Normative reference to an Unknown state RFC: RFC   20


Is this resolved by now?  I can always reference only Unicode.

Thanks for your excellent review,

Nico
--