perl-unicode

Re: Strange characters when displaying html files saved in UTF-8 (BOM)

2001-12-13 07:53:45
Jarkko Hietaniemi wrote on 2001-12-12 14:26 UTC:
On Wed, Dec 12, 2001 at 12:32:17PM +0000, Markus Kuhn wrote:
Jarkko Hietaniemi wrote on 2001-12-11 21:44 UTC:
My spec is at home but I think it's illegal in subsequent text.
(Blindly concatenating text for several files could of course
lead into such a situation.)

The BOM is illegal nowhere. The BOM is a perfectly normal Unicode

Rats.  That's what I get from making things up without looking things up.

character, namely the ZERO WIDTH NO-BREAK SPACE. Browsers must display
it exactly as such (that is: not display a strange character), wherever
it appears. When you test this in your browser, it is also a good

To confuse the issue I think in Unicode 3.2 the BOM will become just
BOM, and a new character will take the ZWNBS role.  As if applications
had already been aware of the old rules...

http://www.unicode.org/versions/beta.html

  Word Joiner (U+2060)

  A new character has been added to take the place of the non-BOM usage of
  U+FEFF. The usage of U+FEFF as ZWNBSP will be deprecated; only the usage
  as a BOM will remain.

So I guess that will now be what we could use in German to make shure
that combined words like "Auflage" will not be typeset with an fl
ligature. For German, it's probabaly better not to use a font with fl
ligature in the first place. Another case for language tagging.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>