ietf
[Top] [All Lists]

Re: [Json] BOMs

2013-11-21 11:41:34
* Allen Wirfs-Brock wrote:
On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:
On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
<allen(_at_)wirfs-brock(_dot_)com> wrote:
Just to be clear about this.  My tests directly tested JavaScript built-in
JSON parsers WRT to BOM support in three major browsers.  The tests directly
invoked the built-in JSON.parse functions and directly passed to them a
source strings that was explicitly constructed to contain a BOM code point .

It would be surprising if JSON.parse() accepted a BOM, since it
doesn't take bytes as input.

ECMAScript's JSON.parse accepts an ECMAScript string value as its input.
ECMAScript strings are sequences of 16-bit values.  JSON.parse (and most
other ECMAScript functions) interpret those values  as Unicode code 
units.  The value U+FEFF can appear at any position within a string. 
When defining a string as an ECMAScript literal, a sequence like \ufeff 
is an escape sequence that means place the code unit value 0xefff into 
the string at this position in the sequence. Also note that the actual 
strings passed below to JSON.parse contain the actual code point value 
U+FEFF not the escape sequence that was used to express it.  To include 
the actual escape sequence characters in the string it would have to be 
expressed as '\\feff'.

A byte order mark indicates the order of bytes in a sequence of bytes.
An ecmascript string is not a sequence of bytes and therefore it cannot
have a byte order mark inside it. Your test is not for BOM support but
for an egregious semantic error in the implementation of JSON.parse.

  http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09

That is a similar test. It makes Firefox see UTF-8 BOMs in ecmascript
strings. Firefox is not supposed to look for UTF-8 BOMs in ecmascript
strings because ecmascript strings are not sequences of bytes at that
level of reasoning.

Is there any chance, by the way, to change `JSON.stringify` so it does
not output strings that cannot be encoded using UTF-8? Specifically,

  JSON.stringify(JSON.parse("\"\uD800\""))

would need to escape the surrogate instead of emitting it literally.
-- 
Björn Höhrmann · mailto:bjoern(_at_)hoehrmann(_dot_)de · 
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

<Prev in Thread] Current Thread [Next in Thread>