Re: RFC 2047 and gatewaying


D J Bernstein <djb(_at_)cr(_dot_)yp(_dot_)to> writes:

Russ Allbery writes:

However, UTF-8 penalizes non-ASCII characters spacewise, and is
somewhat more complex to parse and reason about than a pure multibyte
character set.

Have you ever written a program to handle Unicode characters correctly?
Do you realize that UTF-16 is not a ``pure multibyte'' encoding outside
the Basic Multilingual Plane?


I'm sorry, I should have been clearer.  The intended comparison was not to
UTF-16, which combines the worst of both worlds due to surrogate pairs,
but to UTF-32, which is (so far as I know) a pure multibyte encoding.

Do you realize that Unicode has zero-width accents, so any ``byte count
equals width'' rule can't possibly work?


Yes.  I know about combining marks and other similar characters, and I'm
not saying that even UTF-32 is simple, just that UTF-32 is somewhat
simpler to parse and reason about than UTF-8.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: RFC 2047 and gatewaying, D. J. Bernstein

Next by Date:

RE: RFC 2047 and gatewaying, Dan Kohn

Previous by Thread:

Re: RFC 2047 and gatewaying, D. J. Bernstein

Next by Thread:

Re: RFC 2047 and gatewaying, Charles Lindsey

Indexes:

[Date] [Thread] [Top] [All Lists]