ietf
[Top] [All Lists]

Re: [Json] Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09

2014-12-08 21:31:22

[I've been traveling, please excuse my late responses.]

On Sun, Dec 07, 2014 at 07:55:41PM +0100, Patrik Fältström wrote:
On 7 dec 2014, at 19:05, John Cowan 
<cowan(_at_)mercury(_dot_)ccil(_dot_)org> wrote:
Patrik Fältström scripsit:

But it also reference RFC7159, which doesn't require UTF-8 but instead
for some weird reason also allow other encodings of Unicode text. And
on top of that it says Byte Order Mark is not allowed.

7159 was meant to tighten the wording of 4627, not to impose additional
constraints on it.  For that, see the I-JSON draft.

The problem I have is that 7159 is not tight enough as it allows other
encodings than UTF-8, which in turn make the encoding not work very
well as this draft take for granted each one of the separator
characters is one byte each.

I.e. the way I read draft-ietf-json-text-sequence (and I might be
wrong), you have specific octet values that act as separators. That
only works if the encoding is UTF-8.

Right.  I'll add text to section 2.2 saying that the JSON texts have to
be encoded in UTF-8.

See Figure 1:

possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
                               ; JSON text (see RFC7159)

Now, if this is NOT UTF-8, then this might be pretty bad situation.

Well, you can always fuzz test a parser... :)

But yes, the encoder should use UTF-8.

What I am saying is that I would like this draft to explicitly say
that the only profile of RFC7159 that can be used is when UTF-8 is in
use, i.e. somewhere something like "The encoding MUST be UTF-8,
although RFC7159 also allow other encodings, like UTF-16." Then in the
security considerations section that "RFC7159 do allow not only UTF-8
encoding but also for example UTF-16, which MIGHT create problems for
a parser, all depending on what data is serialized."

I.e. I want this draft to be even more tight than RFC7159.

I agree with this.  This was always my intent (as in: I never intended
to support UTF-16 or UTF-32, say, or any other UTF, in any
implementation of mine).

And I agree that this format doesn't work with UTF-16 or UTF-32 EVEN IF
the UTF were part of the MIME type and UTF-specific multi-byte
separators were used: because at least for log-type applications where
atomicity of writes is in question multi-byte separators won't do.

Nico
-- 


<Prev in Thread] Current Thread [Next in Thread>