Abstract:
This document describes the JSON text sequence format and associated
media type, "application/json-seq". A JSON text sequence consists of
any number of JSON texts, each prefix by an Record Separator
(U+001E), and each ending with a newline character (U+000A).
"any number of JSON texts" -> "any number of UTF-8 encoded JSON texts"
This change concerns me, because it sounds like a JSON text sequence could
consist of JSON texts encoded in UTF-8 and other encodings. I would instead
prefer "any number of JSON texts, all encoded in UTF-8,".
It also looks like ASCII names for RS and LF are being mixed w/Unicode
codepoints in the second sentence in the abstract. I'm not sure
that's a good thing to do, especially as the body of the draft refers
to RS and LF as being ASCII. Here are a couple of changes that would remedy
this:
"an Record Separator (U+001E)" -> "an ASCII Record Separator (0x1E)"
"a newline character (U+000A)" -> "an ASCII newline character (0x0A)"
With John Cowan's change ("an ASCII Line Feed character (0x1E)" instead of "an
ASCII Record Separator (0x1E)"), that would indeed be clearer.
Please no. That would give an even worse mix of UTF-8 and ASCII, bytes and
characters, in the 1 sentence.
".. any number of JSON texts, all encoded in UTF-8, each prefixed by an ASCII
Record Separator (0x1E) .."
How about:
"A JSON text sequence consists of any number of JSON texts,
each prefixed by a Record Separator (U+001E) character, and
each suffixed by an End of Line (U+000A) character. It is
UTF-8 encoded."
Say "Information Separator Two (U+001E)" if you really want to be pure.
Mention in the body that "Record Separator" and "Information Separator Two" are
the ASCII and Unicode names for the same character (as are "Line Feed" and "End
of Line"), which is why RS and LF are used as ABNF names.
P.S. The spec still defines the same ABNF names twice (RS, JSON-sequence): once
as bytes; once as Unicode scalars. Yuck. Just give them different names.
--
James Manger