[Top] [All Lists]

ABNF Re: Troubles with UTF-8

2006-01-05 14:28:27
You say that a Unicode code point can be represented by %xABCD but that is not
spelt out in ABNF [RFC4234].  And when it refers to 'one printable character' as
'%x20-7E' I get the impressions that coded character sets like Unicode, with
more than 256 code points, do not fall within its remit.  I have yet to see any
use of this in an I-D or RFC. I did post a question about this to this list on
24th December and the lack of response reinforces my view that this is uncharted

Tom Petch
----- Original Message -----
From: "James Seng" <james(_at_)seng(_dot_)sg>
To: "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>
Cc: "ietf" <ietf(_at_)ietf(_dot_)org>
Sent: Wednesday, January 04, 2006 6:50 AM
Subject: Re: Troubles with UTF-8

On 12/23/05, Tom.Petch <sisyphus(_at_)dial(_dot_)pipex(_dot_)com> wrote:

A) Character set.  UTF-8 implicitly specifies the use of Unicode/IS10646
contains 97,000 - and rising - characters.  Some (proposed) standards
themselves to 0000..007F, which is not at all international, others to
0000-00FF, essentially Latin-1, which suits many Western languages but is
truly international.  Is 97,000 really appropriate or should there be a

Why should there be a subset? You really really dont want to go into a
debate of which script is more important then the other.

B) Code point. Many standards are defined in ABNF [RFC4234] which allows
points to be specified as, eg,  %b00010011 %d13 or %x0D none of which are
terribly Unicode-like (U+000D).  The result is standards that use one
in the ABNF and a different one in the body of the document; should ABNF
something closer to Unicode (as XML has done with &#000D;)?

Following RFC4234, Unicode code point U+ABCD will just be represented as

I do not see the problem you mention or am I missing something?


-James Seng

Ietf mailing list

<Prev in Thread] Current Thread [Next in Thread>