perl-unicode

Re: Strange characters when displaying html files saved in UTF-8 (BOM)

2001-12-12 07:26:55
On Wed, Dec 12, 2001 at 12:32:17PM +0000, Markus Kuhn wrote:
Jarkko Hietaniemi wrote on 2001-12-11 21:44 UTC:
My spec is at home but I think it's illegal in subsequent text.
(Blindly concatenating text for several files could of course
lead into such a situation.)

The BOM is illegal nowhere. The BOM is a perfectly normal Unicode

Rats.  That's what I get from making things up without looking things up.

character, namely the ZERO WIDTH NO-BREAK SPACE. Browsers must display
it exactly as such (that is: not display a strange character), wherever
it appears. When you test this in your browser, it is also a good

To confuse the issue I think in Unicode 3.2 the BOM will become just
BOM, and a new character will take the ZWNBS role.  As if applications
had already been aware of the old rules...

opportunity to test that the Plane 15 tagging characters are not
displayed as well.

Some recommendations for treating the BOM under Unix and in encoding
converters are in

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen