Two characters in 8IN7 encoding are special. The
backslash (\) character quotes the next character,
which may be either another backslash (\) or a tilde
(~); no other character is valid. The \\ sequence
results in a single backslash, and the \~ sequence
results in a single tilde in the destination stream.
The tilde (~) character applies the high order (0x80)
bit to the next character.
better might be:
\\ => \
\~X => X & 0x7F
\?X => X & 0xFC | 0x20 so 0x80..0x9F transit as 0x20..0x3F
\X => X to be tolerant
in the other direction:
\ => \\
0x80..0x9F => \?X where X is 0x20..0x3F
0xA0..0xFF => \~X where X is 0x20..0x7F
of course, ? is arbitrarily chosen, perhaps something
else is more suitable.
Assuming 0x80.. chars are in the distinct minority, this
looks great for peer-platform interchange, but what about
cross platform? Even for Mac-Mac, if the character
sets are not equivalent (ie, Hebrew, Arabic) this will
display some degree of non-sense on the receiving UI.
Seems to want some additional info, or an understanding. Hate
to drag in the international Q again, and certainly dont want
to rehash it in the same depth as last summer got into.
--
dana s emery <de19(_at_)umail(_dot_)umd(_dot_)edu>