I agree with Patrik - this draft assumes UTF-8 encoding and should
state that requirement explicitly. John's proposed text change below
is in section 2.1 for the decoder; the encoder text in section 2.2
needs a corresponding change:
OLD
In prose: any number of JSON texts, each preceded by one ASCII RS
character and each followed by a line feed (LF).
NEW
In prose: any number of JSON texts encoded as UTF-8, each preceded
by one ASCII RS character and each followed by a line feed (LF).
Thanks,
--David
-----Original Message-----
From: John Cowan [mailto:cowan(_at_)ccil(_dot_)org] On Behalf Of John Cowan
Sent: Sunday, December 07, 2014 4:08 PM
To: Patrik Fältström
Cc: Black, David; Nico Williams; General Area Review Team
(gen-art(_at_)ietf(_dot_)org);
json(_at_)ietf(_dot_)org; ops-dir(_at_)ietf(_dot_)org;
ietf(_at_)ietf(_dot_)org
Subject: Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-
sequence-09
Patrik Fältström scripsit:
I.e. the way I read draft-ietf-json-text-sequence (and I might be
wrong), you have specific octet values that act as separators. That
only works if the encoding is UTF-8.
This is a binary representation which has embedded JSON texts represented
in UTF-8. Since the first character in a JSON text is necessarily in
the ASCII repertoire, it is not possible to parse a UTF-16 or UTF-32
JSON text as UTF-8 and come out with valid JSON.
However, I grant that mentioning UTF-8 only in an ABNF comment is not
really prominent enough. Proposed wording change:
For:
In prose: a series of octet strings, each containing any octet other
than a record separator (RS) (0x1E) [RFC0020], all octet strings
separated from each other by RS octets. Each octet string in the
sequence is to be parsed as a JSON text.
read:
In prose: a series of octet strings, each containing any octet other
than a record separator (RS) (0x1E) [RFC0020], all octet strings
separated from each other by RS octets. Each octet string in the
sequence is to be parsed as a JSON text in UTF-8 encoding.
and add a suitable reference to UTF-8.
Ok, so what you say is that a string in an attribute value in the JSON
blob can still start with U+FEFF?
Just so.
--
John Cowan http://www.ccil.org/~cowan
cowan(_at_)ccil(_dot_)org
As we all know, civil libertarians are not the friskiest group around --
comes from forever being on the qui vive for the sound of jack-booted
fascism coming down the pike. --Molly Ivins