ietf
[Top] [All Lists]

Re: BOMs

2013-11-18 14:56:34
On Mon, Nov 18, 2013 at 8:36 AM, Pete Cordell 
<petejson(_at_)codalogic(_dot_)com>wrote:

----- Original Message ----- From: ""Martin J. Dürst"" <
duerst(_at_)it(_dot_)aoyama(_dot_)ac(_dot_)jp>

 On 2013/11/18 20:11, Henry S. Thompson wrote:

Pete Cordell writes:

 Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs
needed
and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
and MAY accept BOMs for UTF-16 and / or UTF-32"?


My sense is that you'll see more UTF-16 BOMs than anything else.


Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are
discussing.)


The in-memory case is not entirely irrelevant because a number of JSON
messages will be constructed in memory and then squirted to line.

I did a little experiment with Visual Studio.  It will allow me to save in
UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it
UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.

JSON doesn't need BOMs.  However, there are cases where people might hand
edit messages, and if they choose to save in UTF-16 they will likely have a
BOM.

Is it acceptable to tell people not to save hand editted files in UTF-16,
suggesting UTF-8 (possibly with an encoded BOM) as an alternative?

I would imagine that if someone did have a hand editted UTF-8 file on
Windows then the allowance of a BOM would help their sanity immeasurably,
but it's not something I have firsthand knowledge of.



I believe the opposite is true.

The failure of Windows to correctly process documents without BOM markers
is a constant pain trying to use .NET to parse XML.

The ability to compose a JSON message by wrapping another JSON message is
essential. That is, it has to be possible to write something like

printf ("{\"Object\", %s}", Text);


I use the .NET platform heavily. Please do not let Microsoft off the hook
here. The cost of doing so is having to write code to kick out spurious BOM
sequences occurring at any random point in the text. Which becomes really
painful when having to deal with strings where there might actually be a
reason to put the BOM in.

The benefit of not doing so is that it might encourage Microsoft to fix
their tools so that they don't insert spurious BOM sequences in documents
where doing so breaks them.


-- 
Website: http://hallambaker.com/
<Prev in Thread] Current Thread [Next in Thread>