[Top] [All Lists]


1993-05-11 19:03:28
    We've been over and over this.   I don't like the solution, but
there appears to be only one solution, even if it is ugly.  All the
other solutions either cause something else to break -- perhaps
spectacularly -- or require changes on the EBCDIC side of things
(especially the BITNET/EBCDIC side) that, much as I wish otherwise,
don't seem likely to happen.  I'm willing to review those alternatives
with you privately if necessary, but rather suspect that the 822 list is
sick to death of this issue.  More important it is sufficiently late in
the game that discussion of aesthetic preferences is not going to
produce changes in MIME: you would have to demonstrate real operational
problems that cannot be overcome in the existing spec.

   Given how things have gone, I would suggest the following:

(1) Get used to treating "text/plain" (no charset parameter) and
"text/plain;charset=us-ascii" as equivalent.  That is what the spec
says and, for many reasons, it isn't going to change.

(2) Get used to reading the string "us-ascii" in the above as "EBCDIC"
when it appears on a machine that uses EBCDIC internally.  If you don't
believe that, you are going to need to convert the string at the
gateway.  Several of us are convinced that will cause worse problems, at
least unless the point of conversion is moved backward from the
transport gateway to the point of entrance to the EBCDIC host.

(3) Your 32-bit machine argument, and the "binary channel" (presumably
8bit transport) story illustrate some real issues, but not issues
associated with the charset parameter and "issues", not "problems".  The
32-bit machine is going to need to suck octets off the wire and do
something with them.  Having grown up with machines whose "smallest
addressability" was 36 bits, it can be coped with, it is just a matter
of doing it.  And ISO-8859-1 doesn't need 8bit transport; it needs
*either* 8bit transport or a sensible content-transfer-encoding.

(4) "network plaintext" is [US-]ASCII.  It is not 8859-1.  It is always
going to be [US-]ASCII, in it old, traditional, 7-bit form.  No 8bit
ASCII, no SuperDuperCode, [US-]ASCII.   There might come a day when no
one really *uses* "network plaintext", but that won't change the binding
between that concept and ASCII.    Again, all of the other solutions are

OK, let's ask a different question.  Suppose we were asked to build a
gateway from the IP-Internet to a network containing a bunch of EBCDIC
machines and an EBCDIC transport and that we wanted to get it right. 
Let's pretend this is a new piece of work and that we don't need to
worry about transition issues.  What do we do?

First thing we do is to get real serious, network-wide agreement about
what character set(s) we are going to support in the target network and
what we want to treat as "our" "network plaintext".  We write reversible 
mapping rules between MIME-registered characters sets of significant
interest into that/those character set(s) and get agreement about them. 
Then we decide what we are going to do with any characters that happen
to be left over from the mappings and standardize that ritual.  If we
end up with more than one EBCDIC, we figure out how to tell sites
without our network about which one is used in a particular message, but
we use a mechanism that treats Content-type as inviolate.  Maybe a
EBDCIC-page header.

I would expect the above agreements might be very hard to get in a
network context that hasn't been able to agree on how to map 7bit ASCII,
but that isn't a MIME problem.

Now you design the gateways themselves.  You implement the SMTP
extensions --"8BITMIME" in particular-- to reduce as much as possible
the frequency with with you see Q-P or Base64 arriving with text data.
When a message comes in with text/plain, the gateway looks at the
character set.  It it is one it knows how to map, it maps it (it might
have to expand incoming Q-P or Base64 to do that), removes the
Content-transfer-encoding field, and adds EBCDIC-page if necessary.  If
it doesn't know how to map it, it forces it into Base64 (converting from
Q-P if necessary) and then converts to "generic" EBCDIC, changes the
Content-transfer-encoding as needed, and does not add EBCDIC-page.

That is it.  If Content-transfer-encoding is present with a text type,
manifest character codings are in EBCDIC but the interpretation is with
regard to the MIME character set and transfer encodings.  If it is not
present, the characters are in EBCDIC and mean exactly what they appear
to mean.  The gateways eliminate Q-P, which is presumed hard to read in
an EBCDIC world (and should be totally unnecessary).  Character sets
that you don't know how to translate into EBCDIC are forced into Base64;
maybe the target UA will know how to handle them.

The thing that has caused the confusion about all of this is the
obvious, but insidious, assumption that, in an EBCDIC environment,
"charset" should denote a code page.   That isn't going to work as
things are now defined; it is not clear that it was ever possible to
make a variation on it work in a multiple-gateway environment.  It
denotes an abstraction that is actually bound to character codes in an
ANSI/ISO-derived character transport environment.  In an EBCDIC
transport environment, there is an extra degree of abstraction, since it
binds character-abstractions to a mapping table and then to codes,
rather than directly.  But that shouldn't be a bit deal, as long as that
other network can agree on the mapping tables.


<Prev in Thread] Current Thread [Next in Thread>