On Thu, Mar 9, 2017 at 12:53 AM, Benjamin Kaduk <kaduk(_at_)mit(_dot_)edu>
wrote:
If that's what's supposed to happen, it should probably be more
clear, yes. (But aren't there texts that have valid interpretations
in multiple encodings?)
Not if the content is well-formed JSON and the only possible encodings are
UTF-8, UTF-16, and UTF-32. It suffices to examine the first four bytes of
the input. If there are no NUL bytes in the first four bytes, it is UTF-8;
if there are two NUL bytes, it is UTF-16; if there are three NUL bytes, it
is UTF-32. This works because the grammar requires the first character to
be in the ASCII repertoire, and the NUL *character* (U+0000) is not allowed
at all.
--
John Cowan http://vrici.lojban.org/~cowan
cowan(_at_)ccil(_dot_)org
I don't know half of you half as well as I should like, and I like less
than half of you half as well as you deserve. --Bilbo