ietf-822
[Top] [All Lists]

UTF misunderstanding

1991-10-30 11:46:30
Bob Smart (Tue, 29 Oct 91 18:16:51 +1100):

  Now a problem with ISO-2022 (which 10646/ATM/AUC seems determined
  to share) is that the default meaning of octets before any escape
  sequence is undefined. ...

UTF/10646 does not use "shifting" between character sets.  There is
one unique 1-5 octet sequence for each codepoint.

In particular, even though 8859/1 maps to the first 256 code points,
the representation of codes A0-FF is *not* a single-octect A0 through
FF, but the 2-octet A0 A0 through A0 FF.

(This also nicely solves the problem of the line with only 1 registered
trademark AE being mistaken by a 7-bit mailer for a period 2E.  AE is
sent in UTF as A0 AE.)

-drb


<Prev in Thread] Current Thread [Next in Thread>
  • UTF misunderstanding, David Robinson <=