ietf-822
[Top] [All Lists]

Re: Lessons Learned from a Foreign Culture

1994-10-28 13:39:55
About binary c-t-e on ebcdic machines that used fixed-length records:

1. When composing a message, the sender (or his user agent) should choose a
particular c-t-e based on:

a) the liklihood that that particular body part will get damaged in
   transit, 
b) whether such damage will make the message unreadable, and
c) whether backward compatibility with pre-MIME systems is desired.

Given that not all systems support MIME and not all transports are
binary transparent, each of the c-t-e's is a compromise.

It should also be remembered that c-t-e is just a label. 
It is NOT a request for a particular kind of transport service.


2. BINARY c-t-e means that the body part is already in canonical form.  The
contents of the body part consist of the octet following the CRLF at the
end of the body part header, through the octet immediately preceding the
CRLF that begins the following boundary marker (or if not within an
enclosing multipart, through the end of the message)

Base64 and quoted-printable are carefully defined so that they decode to
the same thing in either ascii or ebcdic.  (For example, "A" in
quoted-printable always means 0x41, even if the local charset is ebcdic.)
This allows encoded body parts to pass through a traditional ascii<->ebcdic
mail gateway (which simply translates the entire message) without
losing the ability to recover the canonical form of the contents.

This is NOT true for BINARY.  If a message containing a BINARY body part is
gatewayed from an ASCII into an EBCDIC environemnt, it MUST NOT be
translated to EBCDIC.  Doing so corrupts the data.  A correctly written
user agent will assume that the data in a BINARY body part is the same,
octet for octet, as when it was composed.

Also, as others have pointed out, MIME user agents in an EBCDIC world
expect messages as fixed length records.  Messages must be an integral
number of records long, and a boundary marker must start at the beginning
of a record.  In such a world, it is only possible to represent BINARY
encoded MIME body parts if the length of each BINARY encoded body part is
an integral number of records.  Otherwise, the data is corrupted.


3. Back to point #1:  Choose the c-t-e based on the likely damage etc.  

It is obvious that BINARY is a poor choice for a c-t-e when sending into an
EBCDIC environment, since that environment almost certainly cannot support
it.  Furthermore, the corruption of a "plain text" message in a BINARY body
part in such an environment is likely to be far worse than the corruption
of the same contents in a 7bit/8bit/q-p/base64 body part.

If a BINARY encoded message is to arrive uncorrupted, it seems clear that
the gateways between the ascii and ebcdic worlds need to make special
provision for BINARY encoded body parts, say by converting them to base64. 
Until such gateways are available, using BINARY for mail between the ebcdic
and ascii world will continue to be risky.

However, this is no different than the situation with the mail transport
world in general.  Any MTA or gateway that relays mail between environments
with different end-of-line conventions, needs to make special provisions
for BINARY encoding.  (Given that these requirements aren't immediately
obvious and haven't even been written up, it seems very unlikely that
BINARY encoded body parts will arrive uncorrupted in ANY environment.)


4. on long lines in 7bit and 8bit encodings 

It might be worthwhile to reconsider the requirement that body parts with
7bit and 8bit c-t-e's consist of short lines of text.  In particular,
should a c-t-e of 7bit impose an additional requirement on the content
beyond that of non-MIME, rfc 822, ASCII-only, e-mail?  I realize that RFC
821 prohibits lines longer than 1000 characters, but this seems to often be
ignored by both UAs and MTAs.  Is 1000 character limit of SMTP important
enough that MIME should re-assert it, or should we just let it stand as-is?

Keith