Unicode

Rick McGowan writes:

Unicode, one of the two major competing universal codesets


Correction: Unicode is one of THREE major multilingual character
encodings. Don't forget ISO 2022. Actually, 2022 is the ONLY one that
has been a standard for a number of years.

5. let the receiver worry about how to map from Unicode to whatever  
the local jargon is.


Yeah, right. OK, here comes some Unicode, I'm gonna let the readers of
this list worry about interpreting it:

\00     \00E\00n\00g\00l\00i\00s\00h\00:\00     \00w\00h\00a\00t\00e\00v\00e\00r
\00     \00F\00r\00e\00n\00c\00h\00:\00 \00     
\00F\00r\00a\00n\00\E7\00a\00i\00s
\00     \00J\00a\00p\00a\00n\00e\00s\00e\00:\00 @\00B\03


Here's the same table in DIS 10646, compaction method 5:

        English:        whatever
        French:         Fran\E7ais
        Japanese:       \81\81 @\B0\EC\B1\DF


And here it is in 2022:

        English:        whatever
        French:         Fran\E7ais
        Japanese:       一円


(You will notice that I used the Quoted-Printable encoding to make it
all 7-bit.)


To be fair, I should add that we could use some sort of scheme to
switch between Latin-1 and Unicode to make the Unicode more readable.
As far as I know, Unicode itself does not provide for such switching.
Here is the Unicode table again, using \sw as the switch:

        English:        whatever
        French:         Fran\E7ais
        Japanese:       \sw(_at_)\00B\03


Note that this message doesn't really show which one is the "best".
Let's not start another CodeWar here!


Erik