ietf-822
[Top] [All Lists]

Re: 8-bit transmission in NNTP

1994-09-19 10:18:38
% It is quite widely implemented and deployed, I know many sites outside
% Denmark using it. It is not mainsteam, tho.
}-- End of excerpt from Keld J|rn Simonsen

It is not "widely implemented and deployed".  It is more
deployed than it used to be but it is not widespread.  I do not know
of any commercial implementations at all.  I'd be interested to hear
of commercial implementations

PMDF uses mnemonic support as part of its character set conversion facilities.
I'm quite sure of this since I wrote the code to do it. I believe PP uses
RFC1345 mnemonics as well. That's two commercial implementations right off, and
I'm pretty sure there are several more.

Note that I said "part of". RFC1345 mnemonics are, in my opinion, an essential
part of any real-world solution to character sets. However, there is much
that's needed which this particular mnemonic encoding does not provide.

Specifically, here's what RFC1345 mnemonics are used and useful for. First of
all, forget about character sets intended for Chinese, Japanese, Korean,
Vietnamese, and all other similar languages.  While it is nice in theory to
have a means of representing these character sets in RFC1345 mnemonic scheme
just for completeness' sake, it is never used in practice.

But once you exclude these character sets you are left with a huge number of
remaining character sets. The most important of these in practice are plain
US-ASCII, ISO-8859-1, IBM437, and the character set most Macintosh fonts are
based on. There is a huge installed base of equipment (dumb terminals, PCs, and
Macs) throughout the US and Europe that is based on these character sets, and
typically only one character set is supported. There are also many other
similar character sets, such as the other ISO-8859 variants, the national
variants, other PC character sets, and the various EBCDIC variants, all of
which command substantial albeit smaller market share.

This installed based is there, it is real, and it is not going away. In fact it
is growing quite rapidly. Vendors have to support it -- they have no choice in
the matter. There is also considerable growth in the multilingual workstation
and terminal market, but it is a long way from overtaking all this other stuff.

So what does all this mean? Email users demand that the material they receive
be presented on their terminal in as readable a format as possible. We had all
hoped that quoted-printable would be good enough to handle a lot of this.
Unfortunately, experience has shown that it isn't. Neither is saying, "Just use
ISO10646, or ISO-2022-INT, or some other general solution." These things
definitely have their place as an on-wire representation, but their benefits
for today's end users are limited to the people who have support for them on
their desktop.

RFC1345 mnemonics help solve this problem in two ways. One is that by tabulating
such an immense number of character sets, Keld has provided developers with the
information they need to perform high quality translations from one character
set to another. However, this is simply a use of the tables -- it doesn't have
much of anything to do with the table entries. The entries themselves could
be ISO10646 code points and it would not matter.

The mnemonics themselves enter the picture when the process breaks down. As it
happens, I use a character set called DEC-MCS most of the time. I didn't choose
it -- it is simply what the hardware I use supports. It is based on an
preliminary version of ISO-8859-1 and differs in about six character positions
in all. However, this means that I have some characters available to me that
are not in ISO-8859-1. Suppose I use one of them, such as an oe ligature. My
system is configured to convert DEC-MCS to ISO-8859-1 in outgoing mail and to
fall back to mnemonics when there is no equivalent character. (The opposite
conversion is done on incoming mail.) The mnemonic  for an oe ligature is "oe"
so that is what you'd see. This is a lot better than sending out an ISO-8859-1
multiplication sign!

Alternately, I could send a message out using X.400 and T.61. T.61 has most if
not all of the characters in DEC-MCS (encoded very differently, of course), but
it also has lots of characters that aren't in DEC-MCS. When I receive a message
from X.400 I cannot make any sense at all out of T.61, but a combination of
translation and judicious use of mnemonics make most T.61 text readable on my
terminal.

Is this a perfect solution? Hardly!!! There are many cases even when dealing
with European languages that mnemonics result in unreadable material. But the
fact that this is not a general solution doesn't mean that it isn't a solution
to some problems. And end users now know that such solutions exist, so offering
some inferior albeit "purer" alternative when your competition has mnemonic
support is not a commercially viable choice.

The fact that RFC1345 mnemonics are only a partial solution also means that
additional work must be invested in developing support for all the other
character set problems. It is a pity that there is no one-stop shopping
for solutions in this business, but that is the way it is.

(any vendor who implements Keld-char
with their MIME implementation can not bid their MIME implementation
on the Navy standard workstation contract because we need a GENERAL
solution to multilingual messaging that isn't Euro-centric; ISO-10646
support is strongly desired by us).

The problems actually solved by RFC1345 mnemonics are completely orthogonal to
the problems of full multilingual character set support. As such, a system that
supports them might be eligible to bid on such solutions or it might not, but
support for mnemonics would have nothing to do with it. The only possible way
these could be related is if mnemonics were the only scheme available for
displaying multilingual material on plain US-ASCII displays, and I cannot
imagine anyone being silly enough to specify such a thing. In fact I can
readily see a need to have RFC1345 mnemonics available on such a system to
convert things into the character set you eventually decide to use, and also to
convert whatever material you create to formats that can be read on the old
dumb equipment everyone seems to have.

I note in passing that none of this generalizes anyway -- sales to the US
government or US military typically have nothing to do with requirements for
such broad-based multilingual suport. The regrettable situation at present time
is that if you want to sell software in this area you are far more likely to be
judged on the basis of your support for cc:Mail message formats than anything
else. I don't like this much either, but that's the way it is right now.

                                Ned

<Prev in Thread] Current Thread [Next in Thread>