Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

All,

Many definitions in this document have been specified in the form of 1:1 
mappings of Unicode code points to single bytes. This include the record 
separator and for example definitions of the string "false" in the form of five 
octets.

This is ok if the encoding used for Unicode is UTF-8, and indeed it says that 
the parser should try to parse the string as if it is a UTF-8 sequence of 
characters.

But it also reference RFC7159, which doesn't require UTF-8 but instead for some 
weird reason also allow other encodings of Unicode text. And on top of that it 
says Byte Order Mark is not allowed.

This together implies that first of all this draft might not lead to stable 
implementations, secondly one can not store in JSON strings that include the 
Byte Order Mark, and there are other unspecified situations.

Or, in short, weaknesses, as I see it, in RFC7159 are made even more weak and 
potentially dangerous with draft-ietf-json-text-sequence.

Yes, I and others should probably not have let RFC7159 through, because there 
might be where the bugs are.

Suggestion: this draft, draft-ietf-json-text-sequence, should say explicitly 
that the only "profile" of RFC7159 that is allowed is UTF-8. That should be a 
MUST.

Reminder for the IETF: having "or" statements is not recommended at all when 
talking about these kind of things, and RFC7159 include at least one "or" too 
many. The recommendation from IETF is to use UTF-8 encoding for Unicode (when 
serializing text).

   Patrik

On 5 dec 2014, at 15:51, Black, David <david(_dot_)black(_at_)emc(_dot_)com> 
wrote:

This is a combined Gen-ART and OPS-Dir review.  Boilerplate for both follows 
...

I am the assigned Gen-ART reviewer for this draft. For background on
Gen-ART, please see the FAQ at:

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Please resolve these comments along with any other Last Call comments
you may receive.

I have reviewed this document as part of the Operational directorate's ongoing
effort to review all IETF documents being processed by the IESG.  These 
comments
were written primarily for the benefit of the operational area directors.
Document editors and WG chairs should treat these comments just like any other
last call comments.

Document: draft-ietf-json-text-sequence-09
Reviewer: David Black
Review Date: Dec 5, 2014
IETF LC End Date: Dec 5, 2014
IESG Telechat date: Dec 18, 2014

Summary: This draft is on the right track, but has open issues
              described in the review.

This draft specifies a format that packs multiple JSON texts into a
single string.  The ASCII RS (0x1E) character is used to separate texts,
and a linefeed is appended to each text to ensure that a complete text
always ends with a whitespace character.

All of the open issues are minor - the most important ones center on
treatment of incomplete JSON texts - that appears to be an afterthought
in this draft and needs more attention.  I also found a couple of
minor issues in the Security and IANA Considerations sections, both of
which are almost nits.

Major issues: None.

Minor issues:

[A] Section 2.1:

  If parsing of such an octet string as a JSON text fails, the parser
  SHOULD nonetheless continue parsing the remainder of the sequence.

That's not quite right - there are two levels of parsing, JSON
sequence parsing and JSON text parsing of each text in the sequence,
both of which might be implemented in a single-pass parser.  For such an
implementation, the above sentence could be (mis-)read to imply that the
JSON text parse should resume from the point at which it failed, which
would be silly (although I've seen heroic PL/1 parsers do exactly that).
Instead, the parse needs to skip ahead to the next RS, ignoring the rest
of the JSON text that failed to parse.  I suggest:

  If parsing of such an octet string as a JSON text fails, and the
  octet string is followed by an RS octet, the parser
  SHOULD nonetheless skip ahead to that RS octet and continue parsing
  the remainder of the sequence from there.

That also covers the case where there is nothing more to parse after the
JSON text that caused the parse failure.

[B] Section 2.3:

Is incremental parsing of a JSON text within a sequence allowed, or
is the parser required to not produce any results until the parse of
the entire text is successful?  I'd expect that incremental parsing
is ok (so results may be produced from a text that ultimately fails
to parse), and I think that's worth stating.

[C] Section 2.4:

  Parsers MUST check that any JSON texts that are a top-level number
  include JSON whitespace ("ws" ABNF rule from [RFC7159]) after the
  number, otherwise the JSON-text may have been truncated.

That reference to the "ws" rule doesn't get the job done because that
rule allows a match to no characters - it's of the form ws = *( ... )
where ... is the list of whitespace characters.  What's needed here is
a rule of the form vws = 1*( ...) to force there to be at least one
whitespace character, but see the next issue for a better way to deal
with this topic by pulling the appended LF into the sequence parse
instead of the text parse.

[D] I wonder whether the possibility of incomplete texts ought to be
encoded into the parsing rules to directly catch JSON texts that must
be incomplete because the last character is not LF, e.g.:

    JSON-sequence = *(1*RS (possible-JSON / truncated-JSON / empty-JSON))
    RS = %x1E; "record separator" (RS), see RFC20
    possible-JSON = 1*(not-RS) LF ; attempt to parse as UTF-8-encoded
                              ; JSON text (see RFC7159)
    truncated-JSON = *(not-RS) not-LFRS); truncated, don't attempt
                                      ; to parse as JSON text
    empty-JSON = LF ; only the LF appended by the encoder, nothing to parse

    not-RS = %x00-1D / %x1F-FF; any octet other than RS
    not-LFRS = %x00-09/ %x1B-1D / %x1F-FF; any octet other than RS or LF

Note that this won't detect all incomplete JSON texts, because LF is allowed
within a JSON text (and this should be stated).

[E] Section 3 - Security Considerations

Incomplete and malformed JSON texts can be used to attack JSON parsers -
that should be pointed out, as I don't see that in RFC 7159's security
considerations and incomplete texts are a relevant consideration for
this draft.

[F] Section 4 - IANA Considerations

  Security considerations: See <this document, once published>,
  Section 3.

  Interoperability considerations: Described herein.

  Published specification: <this document, once published>.

  Applications that use this media type: <by publication time
  <https://stedolan.github.io/jq> is likely to support this format>.

Replace all three instances of the angle bracketed text.  The first two
instances should be RFC references (e.g., RFC XXXX) w/a note to the RFC
Editor to insert the number of the RFC when published.  The third instance
should be resolved now, or could have an RFC Editor note added indicating
that the author will resolve that during Authors 48 hours.

Nits/editorial comments:

idnits didn't like the reference to RFC 20 for ASCII:

 ** Downref: Normative reference to an Unknown state RFC: RFC   20

RFC 5234 (ABNF) uses this, which looks like a better reference:

  [US-ASCII]  American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

--- Selected RFC 5706 Appendix A Q&A for OPS-Dir review ---

Most of these questions are n/a because this draft describes a format
that will be used in other protocols to which RFC 5706's concerns would apply.

A.1.4   Have the Requirements on other protocols and functional
      components been discussed?

The specification of the interaction of the JSON sequence parser with the
JSON text parser is not as clear as it should be for incomplete or malformed
JSON texts.  See Minor Issues [A]-[E] above.

A.1.8   Are there fault or threshold conditions that should be reported?

Yes, incomplete JSON texts - this is covered in sections 2.3 and 2.4.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david(_dot_)black(_at_)emc(_dot_)com        Mobile: +1 (978) 394-7754
----------------------------------------------------

signature.asc
Description: Message signed with OpenPGP using GPGMail