ietf
[Top] [All Lists]

RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-10

2014-12-11 17:05:27
I'm not concerned about this - the draft is UTF-8-only (it now explicitly
forbids UTF-16 and UTF-32) and is written on the assumption that it's common
knowledge that 7-bit ASCII (as octets with zero in the most significant bit)
is a UTF-8 subset.

Thanks,
--David

-----Original Message-----
From: Manger, James 
[mailto:James(_dot_)H(_dot_)Manger(_at_)team(_dot_)telstra(_dot_)com]
Sent: Thursday, December 11, 2014 5:51 PM
To: Paul Hoffman; Black, David
Cc: Nico Williams; General Area Review Team (gen-art(_at_)ietf(_dot_)org); 
json(_at_)ietf(_dot_)org;
ops-dir(_at_)ietf(_dot_)org; ietf(_at_)ietf(_dot_)org
Subject: RE: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-
sequence-10

Abstract:

  This document describes the JSON text sequence format and associated
  media type, "application/json-seq".  A JSON text sequence consists of
  any number of JSON texts, each prefix by an Record Separator
  (U+001E), and each ending with a newline character (U+000A).

"any number of JSON texts" -> "any number of UTF-8 encoded JSON texts"

This change concerns me, because it sounds like a JSON text sequence could
consist of JSON texts encoded in UTF-8 and other encodings. I would instead
prefer "any number of JSON texts, all encoded in UTF-8,".

It also looks like ASCII names for RS and LF are being mixed w/Unicode
codepoints in the second sentence in the abstract.  I'm not sure
that's a good thing to do, especially as the body of the draft refers
to RS and LF as being ASCII.  Here are a couple of changes that would
remedy this:

  "an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)"
  "a newline character (U+000A)" -> "an ASCII newline character (0x0A)"

With John Cowan's change ("an ASCII Line Feed character (0x1E)" instead of
"an ASCII Record Separator (0x1E)"), that would indeed be clearer.


Please no. That would give an even worse mix of UTF-8 and ASCII, bytes and
characters, in the 1 sentence.

  ".. any number of JSON texts, all encoded in UTF-8, each prefixed by an
ASCII Record Separator (0x1E) .."

How about:

  "A JSON text sequence consists of any number of JSON texts,
   each prefixed by a Record Separator (U+001E) character, and
   each suffixed by an End of Line (U+000A) character. It is
   UTF-8 encoded."

Say "Information Separator Two (U+001E)" if you really want to be pure.

Mention in the body that "Record Separator" and "Information Separator Two"
are the ASCII and Unicode names for the same character (as are "Line Feed" and
"End of Line"), which is why RS and LF are used as ABNF names.

P.S. The spec still defines the same ABNF names twice (RS, JSON-sequence):
once as bytes; once as Unicode scalars. Yuck. Just give them different names.

--
James Manger


<Prev in Thread] Current Thread [Next in Thread>