Previous: Appendix A -- Minimal MIME-Conformance
Next: Appendix C -- A Complex Multipart Example
Internet email is not a perfect, homogeneous system. Mail may become
corrupted at several stages in its travel to a final destination.
Specifically, email sent throughout the Internet may travel across
many networking technologies. Many networking and mail technologies
do not support the full functionality possible in the SMTP transport
environment. Mail traversing these systems is likely to be modified
in such a way that it can be transported.
There exist many widely-deployed non-conformant MTAs in the Internet.
These MTAs, speaking the SMTP protocol, alter messages on the fly to
take advantage of the internal data structure of the hosts they are
implemented on, or are just plain broken.
The following guidelines may be useful to anyone devising a data
format (Content-Type) that will survive the widest range of
networking technologies and known broken MTAs unscathed. Note that
anything encoded in the base64 encoding will satisfy these rules, but
that some well-known mechanisms, notably the UNIX uuencode facility,
will not. Note also that anything encoded in the Quoted-Printable
encoding will survive most gateways intact, but possibly not some
gateways to systems that use the EBCDIC character set.
- Under some circumstances the encoding used for data may change
as part of normal gateway or user agent operation. In particular,
conversion from base64 to quoted-printable and vice versa may be
necessary. This may result in the confusion of CRLF sequences with
line breaks in text bodies. As such, the persistence of CRLF as
something other than a line break must not be relied on.
- Many systems may elect to represent and store text data using
local newline conventions. Local newline conventions may not match
the RFC822 CRLF convention -- systems are known that use plain CR,
plain LF, CRLF, or counted records. The result is that isolated
CR and LF characters are not well tolerated in general; they may
be lost or converted to delimiters on some systems, and hence must
not be relied on.
- TAB (HT) characters may be misinterpreted or may be
automatically converted to variable numbers of spaces. This is
unavoidable in some environments, notably those not based on the
ASCII character set. Such conversion is STRONGLY DISCOURAGED, but
it may occur, and mail formats must not rely on the persistence of
TAB (HT) characters.
- Lines longer than 76 characters may be wrapped or truncated in
some environments. Line wrapping and line truncation are STRONGLY
DISCOURAGED, but unavoidable in some cases. Applications which
require long lines must somehow differentiate between soft and
hard line breaks. (A simple way to do this is to use the
quoted-printable encoding.)
- Trailing "white space" characters (SPACE, TAB (HT)) on a line
may be discarded by some transport agents, while other transport
agents may pad lines with these characters so that all lines in a
mail file are of equal length. The persistence of trailing white
space, therefore, must not be relied on.
- Many mail domains use variations on the ASCII character set,
or use character sets such as EBCDIC which contain most but not
all of the US-ASCII characters. The correct translation of
characters not in the "invariant" set cannot be depended on across
character converting gateways. For example, this situation is a
problem when sending uuencoded information across BITNET, an
EBCDIC system. Similar problems can occur without crossing a
gateway, since many Internet hosts use character sets other than
ASCII internally. The definition of Printable Strings in X.400
adds further restrictions in certain special cases. In
particular, the only characters that are known to be consistent
across all gateways are the 73 characters that correspond to the
upper and lower case letters A-Z and a-z, the 10 digits 0-9, and
the following eleven special characters:
"'" (ASCII code 39)
"(" (ASCII code 40)
")" (ASCII code 41)
"+" (ASCII code 43)
"," (ASCII code 44)
"-" (ASCII code 45)
"." (ASCII code 46)
"/" (ASCII code 47)
":" (ASCII code 58)
"=" (ASCII code 61)
"?" (ASCII code 63)
A maximally portable mail representation, such as the base64
encoding, will confine itself to relatively short lines of text in
which the only meaningful characters are taken from this set of 73
characters.
- Some mail transport agents will corrupt data that includes
certain literal strings. In particular, a period (".") alone on a
line is known to be corrupted by some (incorrect) SMTP
implementations, and a line that starts with the five characters
"From " (the fifth character is a SPACE) are commonly corrupted as
well. A careful composition agent can prevent these corruptions
by encoding the data (e.g., in the quoted-printable encoding,
"=46rom " in place of "From " at the start of a line, and "=2E" in
place of "." alone on a line.
Please note that the above list is NOT a list of recommended
practices for MTAs. RFC 821 MTAs are prohibited from altering the
character of white space or wrapping long lines. These BAD and
illegal practices are known to occur on established networks, and
implementations should be robust in dealing with the bad effects they
can cause.
Previous: Appendix A -- Minimal MIME-Conformance
Next: Appendix C -- A Complex Multipart Example