ietf
[Top] [All Lists]

Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-07 23:42:58

On 7 dec 2014, at 22:07, John Cowan <cowan(_at_)mercury(_dot_)ccil(_dot_)org> 
wrote:

Patrik Fältström scripsit:

I.e. the way I read draft-ietf-json-text-sequence (and I might be
wrong), you have specific octet values that act as separators. That
only works if the encoding is UTF-8.

This is a binary representation which has embedded JSON texts represented
in UTF-8.  Since the first character in a JSON text is necessarily in
the ASCII repertoire, it is not possible to parse a UTF-16 or UTF-32
JSON text as UTF-8 and come out with valid JSON.

My point is that if you talk about what specific characters or reference RFC20 
or what not, then you only get RS if you use UTF-8 encoding. If you use UTF-16, 
then you neither have RS as one octet (0x1E), nor is RS the only character that 
do have 0x1E as one of the octets.

I think the problem is that I do not know what "octet string" is. You either 
have UTF-8 encoded Unicode strings, or... ;-) In this case, you have a series 
of UTF-8 encoded Unicode Strings, right? Separated by the octet 0x1E, which 
happen to also be a correctly encoded Unicode character -- the Information 
Separator Two. This implies the whole thing is a UTF-8 encoded text that is to 
be parsed like this:

possible-JSON = 1*(not-RS); UTF-8-encoded JSON text
 ; (as specified in RFC7159, but only UTF-8 allowed)

I.e. the blob, to be compliant with this document, MUST be UTF-8 encoded JSON.

Right?

However, I grant that mentioning UTF-8 only in an ABNF comment is not
really prominent enough.  Proposed wording change:

For:

  In prose: a series of octet strings, each containing any octet other
  than a record separator (RS) (0x1E) [RFC0020], all octet strings
  separated from each other by RS octets.  Each octet string in the
  sequence is to be parsed as a JSON text.

read:

  In prose: a series of octet strings, each containing any octet other
  than a record separator (RS) (0x1E) [RFC0020], all octet strings
  separated from each other by RS octets.  Each octet string in the
  sequence is to be parsed as a JSON text in UTF-8 encoding.

and add a suitable reference to UTF-8.

I would say that what you have said above is:

This specifies a series of UTF-8 encoded Unicode strings. Each to be 
interpreted as JSON text. The strings are separated by the octet 0x1E (which is 
UTF-8 encoding of the Unicode Character U+001E - INFORMATION SEPARATOR TWO). 
This character because of this must be escaped, for example by using \u001E 
notation, if it exists in an attribute value.

Ok, so what you say is that a string in an attribute value in the JSON
blob can still start with U+FEFF?

Just so.

Good.

   Patrik

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

<Prev in Thread] Current Thread [Next in Thread>