ietf
[Top] [All Lists]

Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-07 12:56:15

On 7 dec 2014, at 19:05, John Cowan <cowan(_at_)mercury(_dot_)ccil(_dot_)org> 
wrote:

Patrik Fältström scripsit:

But it also reference RFC7159, which doesn't require UTF-8 but instead
for some weird reason also allow other encodings of Unicode text. And
on top of that it says Byte Order Mark is not allowed.

7159 was meant to tighten the wording of 4627, not to impose additional
constraints on it.  For that, see the I-JSON draft.

The problem I have is that 7159 is not tight enough as it allows other 
encodings than UTF-8, which in turn make the encoding not work very well as 
this draft take for granted each one of the separator characters is one byte 
each.

I.e. the way I read draft-ietf-json-text-sequence (and I might be wrong), you 
have specific octet values that act as separators. That only works if the 
encoding is UTF-8.

See Figure 1:

possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
                               ; JSON text (see RFC7159)

Now, if this is NOT UTF-8, then this might be pretty bad situation.

What I am saying is that I would like this draft to explicitly say that the 
only profile of RFC7159 that can be used is when UTF-8 is in use, i.e. 
somewhere something like "The encoding MUST be UTF-8, although RFC7159 also 
allow other encodings, like UTF-16." Then in the security considerations 
section that "RFC7159 do allow not only UTF-8 encoding but also for example 
UTF-16, which MIGHT create problems for a parser, all depending on what data is 
serialized."

I.e. I want this draft to be even more tight than RFC7159.

Let me ask it this way: is there any reason to allow other encodings than 
UTF-8? If so, how do you handle the encoding of the separators?

This together implies that first of all this draft might not lead to
stable implementations, secondly one can not store in JSON strings
that include the Byte Order Mark, and there are other unspecified
situations.

If by that you mean that a JSON string may not contain U+FEFF, that is
incorrect, for U+FEFF is recognized as a BOM only when placed at the
beginning of an entity body, whereas an entity body in JSON format can
begin only with [ or { classically, or by extension with [0-9"tfn].

Ok, so what you say is that a string in an attribute value in the JSON blob can 
still start with U+FEFF?

If so, good, and my apologies for not understanding this at my read of the text.

   Patrik

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

<Prev in Thread] Current Thread [Next in Thread>