* Allen Wirfs-Brock wrote:
On Nov 21, 2013, at 5:28 AM, Henri Sivonen wrote:
On Thu, Nov 21, 2013 at 7:53 AM, Allen Wirfs-Brock
<allen(_at_)wirfs-brock(_dot_)com> wrote:
Just to be clear about this. My tests directly tested JavaScript built-in
JSON parsers WRT to BOM support in three major browsers. The tests directly
invoked the built-in JSON.parse functions and directly passed to them a
source strings that was explicitly constructed to contain a BOM code point .
It would be surprising if JSON.parse() accepted a BOM, since it
doesn't take bytes as input.
ECMAScript's JSON.parse accepts an ECMAScript string value as its input.
ECMAScript strings are sequences of 16-bit values. JSON.parse (and most
other ECMAScript functions) interpret those values as Unicode code
units. The value U+FEFF can appear at any position within a string.
When defining a string as an ECMAScript literal, a sequence like \ufeff
is an escape sequence that means place the code unit value 0xefff into
the string at this position in the sequence. Also note that the actual
strings passed below to JSON.parse contain the actual code point value
U+FEFF not the escape sequence that was used to express it. To include
the actual escape sequence characters in the string it would have to be
expressed as '\\feff'.
A byte order mark indicates the order of bytes in a sequence of bytes.
An ecmascript string is not a sequence of bytes and therefore it cannot
have a byte order mark inside it. Your test is not for BOM support but
for an egregious semantic error in the implementation of JSON.parse.
http://shadowregistry.org/js/misc/#t2ea25a961255bb1202da9497a1942e09
That is a similar test. It makes Firefox see UTF-8 BOMs in ecmascript
strings. Firefox is not supposed to look for UTF-8 BOMs in ecmascript
strings because ecmascript strings are not sequences of bytes at that
level of reasoning.
Is there any chance, by the way, to change `JSON.stringify` so it does
not output strings that cannot be encoded using UTF-8? Specifically,
JSON.stringify(JSON.parse("\"\uD800\""))
would need to escape the surrogate instead of emitting it literally.
--
Björn Höhrmann · mailto:bjoern(_at_)hoehrmann(_dot_)de ·
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/