On 7 dec 2014, at 19:05, John Cowan <cowan(_at_)mercury(_dot_)ccil(_dot_)org>
wrote:
Patrik Fältström scripsit:
But it also reference RFC7159, which doesn't require UTF-8 but instead
for some weird reason also allow other encodings of Unicode text. And
on top of that it says Byte Order Mark is not allowed.
7159 was meant to tighten the wording of 4627, not to impose additional
constraints on it. For that, see the I-JSON draft.
The problem I have is that 7159 is not tight enough as it allows other
encodings than UTF-8, which in turn make the encoding not work very well as
this draft take for granted each one of the separator characters is one byte
each.
I.e. the way I read draft-ietf-json-text-sequence (and I might be wrong), you
have specific octet values that act as separators. That only works if the
encoding is UTF-8.
See Figure 1:
possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
; JSON text (see RFC7159)
Now, if this is NOT UTF-8, then this might be pretty bad situation.
What I am saying is that I would like this draft to explicitly say that the
only profile of RFC7159 that can be used is when UTF-8 is in use, i.e.
somewhere something like "The encoding MUST be UTF-8, although RFC7159 also
allow other encodings, like UTF-16." Then in the security considerations
section that "RFC7159 do allow not only UTF-8 encoding but also for example
UTF-16, which MIGHT create problems for a parser, all depending on what data is
serialized."
I.e. I want this draft to be even more tight than RFC7159.
Let me ask it this way: is there any reason to allow other encodings than
UTF-8? If so, how do you handle the encoding of the separators?
This together implies that first of all this draft might not lead to
stable implementations, secondly one can not store in JSON strings
that include the Byte Order Mark, and there are other unspecified
situations.
If by that you mean that a JSON string may not contain U+FEFF, that is
incorrect, for U+FEFF is recognized as a BOM only when placed at the
beginning of an entity body, whereas an entity body in JSON format can
begin only with [ or { classically, or by extension with [0-9"tfn].
Ok, so what you say is that a string in an attribute value in the JSON blob can
still start with U+FEFF?
If so, good, and my apologies for not understanding this at my read of the text.
Patrik
signature.asc
Description: Message signed with OpenPGP using GPGMail