--On 23. desember 2005 11:36 +0100 "Tom.Petch" <sisyphus(_at_)dial(_dot_)pipex(_dot_)com>
wrote:
A) Character set. UTF-8 implicitly specifies the use of Unicode/IS10646
which contains 97,000 - and rising - characters. Some (proposed)
standards limit themselves to 0000..007F, which is not at all
international, others to 0000-00FF, essentially Latin-1, which suits many
Western languages but is not truly international. Is 97,000 really
appropriate or should there be a defined subset?
I think Ned has answered most of your other points... I'll chime in on this
one.....
My opinion: ALL attempts at defining an "useful" character set of any size
between 128 and "all you can eat" for use internationally have been dismal
failures. They have been used in some niche, sooner or later there's a need
to work outside that box, and gateways or other forms of self-torture
result. (Alvestrand's equality: gateways = pain).
At the moment, the only reasonable candidate for an "all you can eat"
character set is the Unicode charset. All other alternatives, including the
bizarrely byzantine character set switching schemes of ISO 2022, are
basically dead in the marketplace.
So there are only two real choices for charset left: ASCII and Unicode.
ASCII is unsuitable for any language except the technologists' simplified
version of English. So if you want text, and want it to work
internationally, there's only one choice left.
Subsets are a mistake.
Harald
pgpkg8DJTaeVB.pgp
Description: PGP signature
_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf