On Wed, Nov 30, 2005 at 02:55:55PM +0000, Ian G wrote:
I think what you mean is that the whitespace
should not be included in the calculation,
but it doesn't matter whether they are stripped
from the document itself.
Yes, that is precisely what I mean.
It is an issue, yes. We discussed this a while
back and came to the conclusion that *only* ascii
whitespace was to be stripped/ignored, as the
alternate was too hard to define. That's why
the specific characters to be stripped are in
the spec - to stop people looking for cyrillic
spaces or different sized spaces.
Unfortunately, unlike canonical ascii, there is no one-to-one correspondence
between unicode characters and glyph visuals. It would be nice to have some
canonical form for unicode text which is human-readable, yet has a unique
binary representation, but it's not easy and not the job of this WG, IMHO.