Hi there.
So, what you are saying is that is to XML and HTML has "#define
nbsp" is to C??
-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com] On Behalf Of
Mike Brown
Sent: Friday, November 08, 2002 7:13 AM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] nbsp is not that hard, folks
Brian Grainger wrote:
If you're trying to escape in a document encoded as UTF-8, you
have to use Unicode escaping of the UTF-8 representation of the
entity. In this case, is equal to  , and   encoded as
UTF-8 is \u00A0.
Good grief. No, you have your terminology badly mixed up, and you're
throwing in an irrelevant notation. " " " " and "\u00A0" have
nothing, NOTHING to do with UTF-8. There is something about nbsp that
just confuses the heck out of people. I think it must be the fact that
it looks like a space, and that you don't have an nbsp key on your
keyboard.
OK, read this.
1. There is a character -- an abstract unit in a "script" (a writing
system;
we are using Latin right now) -- called NO-BREAK SPACE by the Unicode
Standard and ISO/IEC 10646. Unicode and ISO/IEC 10646 assign this
character an integer number, 160, which is A0 in hex. We say Unicode all
the time around here, but
we mean ISO/IEC 10646 because that's what the XML and HTML specs
reference.
The two standards share the same character repertoire and numbering so
there's
no harm.
2. UTF-8 is an encoding scheme that provides a way of representing any
of the approximately 1.1 million possible abstract characters in Unicode
as a sequence of 1 to 4 bytes. The UTF-8 representation of the Unicode
character 160 (no-break space), is the pair of bytes C2 A0, in that
order. In contrast, iso-8859-1 is a character map that provides a way of
representing the first 256 Unicode characters as a single byte. us-ascii
is an even more limited set
of just the first 128, mapped to a single byte.
3. This thing: \u00A0
- is a sequence of 6 bytes (ASCII bytes for slash, u, zero, zero, A,
zero);
- has special meaning in a programming language like Java or Python,
where it is essentially a macro for the no-break space character;
- is used when representing the character directly as encoded bytes is
impractical or impossible.
4. This thing:  
or this thing:  
- is to SGML applications like HTML and XML what \u00A0 is to Java &
Python;
- is called a character reference (or "numeric character reference").
5. This thing:
- is to SGML applications like HTML and XML an "entity reference";
- refers to an entity (a separate collection of information) named
nbsp;
- depending on the circumstances, is intended to be treated by the
XML parser or HTML user agent as equivalent to the entity's
"replacement text";
- is, in HTML, predefined to have the replacement text of just one
character, the no-break space;
- is not defined by default in XML.
6. The thing here in between the quotes: " "
- is byte 0xA0;
- is intended to be a no-break space because this email is iso-8859-1
encoded;
- has exactly the same meaning in an XML document as  .
- Mike
________________________________________________________________________
____
mike j. brown | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | resume:
http://skew.org/~mike/resume/
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list