ietf
[Top] [All Lists]

Re: [Json] BOMs

2013-11-21 11:20:20

On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:

On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
<allen(_at_)wirfs-brock(_dot_)com> wrote:
Just to be clear about this.  My tests directly tested JavaScript built-in
JSON parsers WRT to BOM support in three major browsers.  The tests directly
invoked the built-in JSON.parse functions and directly passed to them a
source strings that was explicitly constructed to contain a BOM code point .
This was done to ensure that the all transport layers  (and any transcodings
they might perform) were bypassed and that we were actually testing the real
built-in JSON parse functions.

It would be surprising if JSON.parse() accepted a BOM, since it
doesn't take bytes as input.

ECMAScript's JSON.parse accepts an ECMAScript string value as its input.  
ECMAScript strings are sequences of 16-bit values.  JSON.parse (and most other 
ECMAScript functions) interpret those values  as Unicode code units.  The value 
U+FEFF can appear at any position within a string. When defining a string as an 
ECMAScript literal, a sequence like \ufeff is an escape sequence that means 
place the code unit value 0xefff into the string at this position in the 
sequence. Also note that the actual strings passed below to JSON.parse contain 
the actual code point value U+FEFF not the escape sequence that was used to 
express it.  To include the actual escape sequence characters in the string it 
would have to be expressed as '\\feff'.

JSON.parse('\ufeff ["XYZ"]');  //note outer quotes delimit an ECMAScript 
string, the inner quotes are a JSON string.  

throws a runtime SyntaxError exception because the JSON grammar does not allow 
U+FEFF to appear that position

JSON.parse('["\ufeffXYZ"]');

operates without error and returns a Array containing a four element ECMAScript 
string.   This works because the JSON grammar allows any code unit except for " 
and \ and the ASCII control characters to appear literally in a JSON string. 



However, XHR's responseType = "json" exercises browsers in a way where
the input is bytes from the network. From the perspective of JSON
support in XHR,
http://lists.w3.org/Archives/Public/www-tag/2013Nov/0149.html (which
didn't reach the es-discuss part of this thread previously) applies.

Right, JSON use via XHR is a different usage scenario and that probably 
involves encoding and decoding steps. It has very little to do with the JSON 
syntax, as defined in ECMA-404. It's all about how the bits that represent a 
string are interchanged, not the eventual semantic processing of the string 
(ie, processing by JSON.parse or some other JSON parser)

Allen


<Prev in Thread] Current Thread [Next in Thread>