ietf
[Top] [All Lists]

Re: BOMs

2013-11-18 11:49:10
----- Original Message ----- From: ""Martin J. Dürst"" <duerst(_at_)it(_dot_)aoyama(_dot_)ac(_dot_)jp>
On 2013/11/18 20:11, Henry S. Thompson wrote:
Pete Cordell writes:

Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32? In other words, are BOMs needed
and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
and MAY accept BOMs for UTF-16 and / or UTF-32"?

My sense is that you'll see more UTF-16 BOMs than anything else.

Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are discussing.)

The in-memory case is not entirely irrelevant because a number of JSON messages will be constructed in memory and then squirted to line.

I did a little experiment with Visual Studio. It will allow me to save in UTF-8 with or without a BOM (like thing). Saving in UTF-16 (Or was it UCS2?) is always with a BOM. There didn't seem to be a UTF-32 option.

JSON doesn't need BOMs. However, there are cases where people might hand edit messages, and if they choose to save in UTF-16 they will likely have a BOM.

Is it acceptable to tell people not to save hand editted files in UTF-16, suggesting UTF-8 (possibly with an encoded BOM) as an alternative?

I would imagine that if someone did have a hand editted UTF-8 file on Windows then the allowance of a BOM would help their sanity immeasurably, but it's not something I have firsthand knowledge of.

I believe Unix/Linux works with UTF-8 without BOMs.  Is this the case?

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com

<Prev in Thread] Current Thread [Next in Thread>