Here with is a new draft of this document - I've included a number of
useful contributions from others and added some more discoveries of my own.
Julian.
Network Working Group Julian Onions
Request for Comments: DRAFT Nexor Ltd
March 27, 1995
How to be a Bad EMail Citizen
1. Status of this Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. (The
file 1id-abstracts.txt on nic.ddn.mil describes the current status of
each Internet Draft.) It is not appropriate to use Internet Drafts as
reference material or to cite them other than as a "work in
progress".
This draft is known as draft-onions-822-mailproblems-01.
2. Abstract
The internet consists of many hosts and many implementations of each
protocol suite. There are no formal tests or approval mechanisms
associated with membership of the internet, and therefore there are
very varied levels of conformance to the various standards. This
document intends to describe some of the common problems, mistakes
and errors that are made in electronic mail. Most of them are easily
avoidable, and some guidance on what to do in each case is given
here. Some of these guidelines are pragmatic, some are mandated by
other standards, and others are religious.
3. Introduction
There are various documents around the internet that define the way
mail should behave, what is mandatory, what is optional and what is
forbidden. Adherence to these standards across implementations is at
best patchy, and with no overseeing body the only enforcement to the
standards are peer pressure and possible lack of service. This
document started out as an attempt to summarise the various ways the
fairly straight-forward mail protocol can be mis-implemented. This
document tries to point out cases where common usage has extended the
standard (unofficially) and where implementations are wrong. This is
Onions Expires Sep 30, 1995 [Page 1]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
somewhat analagous to defining "correct" english. Is it what is
defined, or is it like y'know what people kinda say?
4. Scope
This document restricts itself to the standards defined in RFC-821
(SMTP), RFC-822, RFC-1123 (Host Requirements), RFC-1521 (MIME) and
RFC-1651 (SMTP Extensions). Currently other documents are not
considered.
5. Issues concerning SMTP
5.1. The RSET Command
RFC-821 is not specific about exactly what the RSET verb resets.
This has apparently not been a problem in the past because of the
simplicity of the protocol. With the publication of extensions to
the SMTP protocol with additional commands and state information,
making a more precise definition desirable. The definition provided
should not constrain any existing RFC-821 implementation since it is
consistent with both the current practice and the only two plausible
interpretations.
RSET is to be interpreted by SMTP servers as clearing state
information present in a session. In particular, it eliminates the
effect of any prior FROM commands, any DATA, and any delivery
addresses. It resets the server's state to "not a mail transaction".
This implies it is in the state after the HELO and before the MAIL
verb.
RSET has been interpreted by some SMTP servers as requiring that a
new HELO command be sent after RSET is acknowledged. Other servers
assume that the previous HELO is not reset. Servers SHOULD accept a
HELO command subsequent to RSET without special comment, overriding a
previous one if necessary. Servers MUST NOT require a HELO command
after a RSET.
Worse still are servers that drop the connection on an RSET verb
being issued. As some clients issue an RSET whenever they feel like
it (at the end of a transfer for instance), such behaviour is very
inefficient.
The description above summarizes the current situation with SMTP
implementations based on a series of experiments. No implementations
have been identified that rejects a second HELO, but it would not be
surprising to find one.
Onions Expires Sep 30, 1995 [Page 2]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
5.2. Duplication of single state verbs.
Whilst some of the SMTP state-inducing verbs may be repeated and
arbitrary number of times (such as RCPT for multi-destination) other
verbs (such as MAIL) may only be issued once per transaction. If a
second occurrence of state-inducing verb is detected, a server MAY
either accept it, overriding earlier information, or may reject it as
an out-of-sequence command with a "503 bad sequence of commands"
code. A client sending multiple of these commands within a mail
transaction MUST be prepared to send a RSET and start over, or to
send QUIT and abandon the session, if 503 is received in this case.
Clients SHOULD, if possible, behave in a way that avoids this
situation.
The issues above do not arise in the normal case of multiple
successful message transmissions in the same session, since each
successful message completion (i.e., server receipt of DATA, the
message, CR LF . CR LF, and then sending a positive completion
reply) results in terminating a mail transaction. Clients SHOULD NOT
send RSET after receipt of a 250 response after DATA and the message;
servers MUST reset their states after sending that 250 response and
MUST NOT require clients to send RSET before the next MAIL FROM
command
5.3. Behavior with unrecognized verbs.
While it is not quite explicit, RFC-821 appears to expect that, if a
verb is not recognized by the receiver, it will reject the command
with a "permanent error", 5yz, code, presumably 500 (Syntax error).
Similarly, it appears to specify that, if the sender receives such a
code, it must either abandon the mail message (sending QUIT or RSET,
presumably) or do something else involving the same or a different
verb; it may not simply ignore the 5yz error code and pretend it was
a 2yz (or 354) code. This specification depends on that behavioral
model.
As defined in RFC-821, existing SMTP servers MUST reply with a return
code of 500 (Syntax error) when any unfamiliar verb is received.
The material above should probably have made it into RFC-1123, but
some of the issues -- particularly the fact that anyone could ever
have believed that anything else (such as simply ignoring 5xx codes)
was permitted--have emerged only in the process of this
investigation. Nonetheless, this clarification is believed to be
consistent with existing usage and implementations of SMTP.
Onions Expires Sep 30, 1995 [Page 3]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
5.4. Behaviour with eight-bit data
RFC-821 together with RFC-822 is unambiguous in this respect. Unless
an extension to RFC-821 is in force for the mail transaction, eight-
bit data may not be sent. Period.
This point just needs emphasising. It is present in the original
documents, but not spelled out.
If an implementation receives eight-bit data there are various things
it may do. These come down to one of the following, and depend on how
charitable you wish to be, or how much political pressure is placed
on you.
1. Bounce the message. This is the correct thing to do. The
message is a protocol violation and should therefore not be
acceptable. Send it back!
2. Strip the 8th bit. This ends up making the message conformant,
but may well loose information. It is an interesting approach as
it lets the mail through - albeit distorted. The problems
usually end up on the recipients desk - whereas it is the
senders fault.
3. Convert the message. If you are very charitable, you may
consider converting the message to 7bit data by making the
message into a MIME message (if it is not already) and then
applying a content-transfer-encoding. This has a down side that
your machine is having to expend considerable effort fixing up
other peoples mistakes.
All but the first action are compromises. The fault is that of the
originator and they should have to bear the pain. With all but case
1, it is either the recipient, or an intermediate machine that has to
suffer.
5.5. Error reports with eight-bit data
Some implementations will return the original message as part of a
delivery report. Care needs to be taken in this case that the reason
for failure was that eight-bit data was present. Otherwise it is
possible to construct an illegal eight-bit message as an error report
to an eight-bit message.
As error reports and messages cannot be easily distinguished in
RFC821, all messages (including error messages) appear as standard
messages, and therefore need to be correct RFC822 messages.
Onions Expires Sep 30, 1995 [Page 4]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
5.6. SMTP MAIL FROM response.
The use of the error 550 (Address Unknown) is not documented in RFC-
821 as a response to the MAIL FROM verb. However this seems to be the
most appropriate response to a client that tries to give an
unrouteable mail origintor. Therefore this code should be taken as a
permanent rejection. It implies that the mail will not be accepted as
the MTA does not know how to route error reports. Some MTAs care
about this, others do not.
5.7. Rejection of SMTP connections due to DNS failure.
There are a number of SMTP implementations that either do, or can be
configured, to reject SMTP connections if the calling host is not
registered in the DNS. This is seen by some as a breaking of the
spirit of RFC-1123, and by others as a useful get-out-of-jail card.
Regardless of whether this is a good idea or a bad one, the fact
remains this is practiced by some sites. Implementors are therefore
encouraged to use back up MX routing in the case of a connection that
succeeds but no data is received before the connection is dropped.
This topic has been debated a number of times on the Internet with
both sides sticking to their views. There is no sense in continuing
to try and standardise this point. What a site will do with any
internet connection from any host eventually comes down to what the
administrator at that site decides. If they don't want to talk to a
given set of hosts, that may be their loss. With the increasing
emphasis on security though, the fact that a site advertises an MX or
A record in the DNS does not imply it will talk to all callers.
5.8. EHLO commands
There are one or two servers that respond badly to EHLO commands.
That is they either set themselves into inconsistent states, or else
drop the connection at once. The RFC is fairly clear that unknown
commands should be rejected but otherwise ignored.
A resilient server MAY detect that the EHLO caused the connection to
drop and immediately retry the connection with a HELO verb in place.
Alternatively it can be treated as a bad connection and subsequent MX
records tried if available. However servers SHOULD NOT drop the
connection in response to an unknown verb.
5.9. SMTP Multi-line responses.
Both SMTP client and server implementations should be prepared to
take multi-line responses at any part of the transaction. In
particular a multi-line HELO/EHLO response should be acceptable.
Onions Expires Sep 30, 1995 [Page 5]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
5.10. SMTP use of other services.
When using SMTP in a restrictive environment, care should be taken
that associated services are allowed. For instance, use of the DNS is
often required for incoming SMTP servers, and IDENT may also be in
use. Care must be taken when constructing firewalls that these
protocols are allowed to penetrate if they are required.
5.11. Error reports
If an error is detected at the MTA level, such as non-delivery,
protocol failure or some such, an error message should be sent to the
parameter passed in the MAIL FROM field. The Reply-To, From and
Sender fields should never be used for such purposes. These fields
are for the sole use of the end-user recipient and their user agent
software. They should be no concern of the MTA.
On a related note, if a list exapander is used, the MAIL FROM
parameter should be set to something other than the mail list name.
Otherwise error reports can loop with exponential explosive force.
The standard technique is when expanding list <name> to originate the
message from <name>-request (or sometimes <name>-owner). This address
should resolve to a single individual who can be charged with
maintaining the list.
6. RFC-822 Issues
6.1. Illegal format RFC-822 messages
Some implementations of RFC-821 check the message for adherence to
RFC-822 minimum requirements as the message is received. These are
that the message contains in the header a From field, a Date field
and a recipient field of some type. However, some user agents use
RFC-821 as a submission protocol and assume that messages will be
made legal RFC-822 as part of the submission process (as some MTA's
already do this). Implementations MAY therefore allow strictly
illegal RFC-822 messages as data and make them legal by addition of
new headers, or MAY reject the message as illegal data.
Some User Agents, particularly those on PC's find it difficult to
determine an accurate time to provide a Date field, and therefore
leave it out. It is harmless enough to insert such a field when
acting as a submission channel, but inserting a Date mid way through
a multi-hop delivery path is mis-leading and should be discouraged.
However, in practice it is difficult to determine the two modes RFC-
821 is used in, so usually a blanket decision concerning all
transfers has to be made. What is really required is a submission
protocol tailored for this sort of behaviour that can take a partial
Onions Expires Sep 30, 1995 [Page 6]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
RFC-822 message and add the appropriate envelope bits.
6.2. RFC-822 addresses
RFC-822 addresses are more complex than might appear at first sight.
The simple use of <user>@<host> is by far the most prevelant, but
when writing a parser, care must be taken with the more unusual
format of addresses. Many implementations deal very badly, or not at
all with the more unusual forms of addresses. Take the following
cases, which all seem to cause one or another implementation
indigestion.
@host,host2:user(_at_)host3 "J.Onions"@nexor.co.uk List:
user(_at_)host(_dot_)domain;
(comment) JOnions (comment) @ (comment) nexor.co.uk (comment)
\"f\"@b.foo
Other useful examples are in Appendix A.3 in RFC-822.
Worse than these are the psuedo-legal addresses that get produced. An
implementation should be able to deal with such horrors as the
following, where "deal with" is to do something other than crash and
burn.
user::something(_at_)foo(_dot_)bar @host.domain:auser
user!fred(_at_)domain(_dot_)ref
Another item to beware of is very large To/Cc lines. Some attempt
list expansion by a macro expansion of the To: field. This can lead
to 1000 or more addresses appearing in the To: line. Whilst this is a
bad idea to do, implementations should be careful of using fixed
buffer sizes - such header lines can be 250 lines long!
6.3. Received Lines
The syntax of the Received: lines in RFC-822 messages is reasonably
straight forward. It requires as a minimum a date stamp following a
semi-colon. Unfortunately some implementations cannot seem to
generate this. This can cause problems when gatewaying to other
systems that also have trace fields. This is seen as a good way to
cause general confusion when tracking messages.
When gatewaying or examining these elements, the invalid elements
should either be discarded or else the current time inserted to make
them legal. The illegal Received: lines can be changed to be Orig-
Received: to ensure the relayed message is now legal.
Note that the received line should contain fully qualified host names
if any are present, rather than just local references. This aids in
tracking the message, makes for better gatewaying and stop confusion.
Onions Expires Sep 30, 1995 [Page 7]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
Note also that the "id" construct takes a msg-id construct which is
of the form "<" addr-spec ">".
6.4. Comments in Received lines
A strict reading of RFC-822 shows that comments are not legal within
Received lines. However so many implementations insert them, and the
host requirements document encourages them to be added under certain
conditions, that it is wise to consider the EBNF to be extended to
allow comment fields at any point in the line. A comment should be
taken as the EBNF <comment> construct from RFC-822. This is basically
some text found between parenthesis, such as:
Received: from foo.bar (actually xyz.foo.bar)
by lancaster.nexor.co.uk id
<msg(_dot_)123456(_at_)nexor(_dot_)co(_dot_)uk> with SMTP
(v1.1) ; Mon, 20 Feb 1995 12:28:53 +0100
6.5. Date fields.
Date fields are usually fairly standard, although there are
implementations that strike out with new and novel formats. However,
when it comes to the area of time zones there is little limitation in
the imagination of implementors. Normally time zones should be
numeric as these are unambiguous. It should be down to the user agent
to display the Date in a ``pretty'' format.
Just say NO to pretty, arbitrary timezones! All UAs should generate
numeric offsets for timezones. How they are displayed is up to the UA
however.
6.6. Resent- fields
RFC-822 allows the pseudo-forwarding of messages by amending the
header of a message to contain new recipients. This is done by adding
headers such as
Resent-To: abc(_at_)domain(_dot_)name Resent-Date: Sun, 1 Jan 1995 02:24
+0000
Resent-From: xyz(_at_)foo(_dot_)bar
It is not clear in RFC-822 if when resending a message a complete set
of headers is required. The standard would seem to imply that they
are but no grammar is present which mandates it. Therefore
implementations vary on how to treat this type of message.
Strict implementations will on detection of a Resent- field, conclude
that this is a resent message, and therefore should be using the
Resent- versions of the fields as opposed to the standard forms. In
Onions Expires Sep 30, 1995 [Page 8]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
this case a message without a Resent-From, a Resent-Date and a
Resent- recipient field is illegal. It is assumed that the message
has been resent but with only a partially correct header.
Other implementations take the view that a Resent- field is a higher
weighted form of the original field. That is, a Resent-Date should be
used in preference to a Date field, but as long as a Date, From and
Recipient field is present with or without the resent- prefix the
message is legal.
The first view treats the resent- as a new overriding SET of headers,
the second as individual replacements for fields. Either case could
be argued, as the original text is unclear.
For pragmatic reasons, and because it seems closer to the intent of
RFC-822 in this case, the Resent- fields should be taken as a set.
However implementations SHOULD allow the individual fields. In
practice this sort of forwarding is not very common, but does arise.
6.7. Separator between RFC-822 Header and Content.
RFC-822 defines a separator of a single blank line to delimit the
header from the body of the message. It seems apparent that some
implementations either have problems generating a blank line or
problems reading the standard.
The definition is a single blank line and hence consists of the
characters CR LF CR LF. Any other sequence is difficult to parse
reliably, so the string CR LF CR LF MUST always be used. Whilst one
can think up possible heuristic parsing techniques to allow sort-of-
blank lines - they are error prone. In particular a line consisting
solely of white space is a valid continuation line, and may be used.
Consider the following:
Subject: Hello<CR><LF> <SPACE><CR><LF> <SPACE>How are you<CR><LF>
This is a legal subject line, spread across three lines. In isolation
it may look like the end of a header and the start of a message
content. If an implementation applies heuristics to the delimiter it
may well get confused by such cases. If the following line started
To: it would be clear this was a header line continuation. If it
started with some non white space text without a colon in it
anywhere, this is probably a mistyped separator. If there is a colon
in the next line, you can start guessing if it looks like a header
field, or a piece of text.
To try and circumvent the problem - use of header continuation lines
consisting solely of white space should be strongly discouraged. You
Onions Expires Sep 30, 1995 [Page 9]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
never know what a helpful mailer might do with it!
7. MIME issues.
MIME since its inception has allowed implementations of MTAs and UAs
to further the cause of havoc and generally increase entropy. The
number of ways that it is possible to get this specification wrong is
truely astounding! In general an MTA can treat badly formatted MIME
as a text/plain format and punt the whole problem to the UA. The UA
will take a number of views:
a) It will crash and burn.
b) It will complain the message is illegal and refuse to show it.
c) It won't care and show you the message, warts and all.
d) It will ignore the message, and you will never even know you
have received the message.
The best approach is to be able to flag an error and then revert to
action c) above. This may upset some naive mail users (who seem to be
predominantly upper management and therefore dangerous to upset!).
7.1. Badly formatted Content-Type: fields
Implementations have been known to produce lines of the form
MIME-version: 1.0 Content-Type: text
That is, a MIME type, without the mandatory subtype. This is illegal
as a MIME header and means the content may be subject to
misinterpretation.
In these cases the most pragmatic case is to treat the message as
text/plain, regardless of what the Content-type might indicate.
However, outright rejection of the message is also an option. (The
author feels a system that rejects every other such message may have
merits in forcing systems to be upgraded.)
7.2. Multiple Content-Type fields
Messages may contain multiple Content-Type fields, sometimes
containing contradictory information. Where this happens this may
again cause contents to be misrepresented, or misprocessed. For
instance:
MIME-version: 1.0 Content-Type: multipart/mixed; boundary="---"
Onions Expires Sep 30, 1995 [Page 10]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
MIME-version: 1.0 Content-Type: text/plain
As for the badly formatted contents type. If two Content-Type fields
are present, and contain the same information, that case MAY be
treated as just one Content-Type field.
7.3. Badly structured multipart messages
Message that contain fields such as
Content-Type: multipart/mixed
have some great potential for causing indigestion in mail systems.
The missing boundary string means that although the message is split
into multiple parts, there is no way a process can reconstruct the
message in general.
It is charitable to believed that these type of messages start out
with good intentions, but loose their boundary markers somewhere in
flight. Whilst an intelligent human can scan the body part and make
an educated guess at what the separator is, this is not generally
possible for a program.
7.4. Wrapped lines
Another interesting little problem is where a UA, or MTA has
helpfully wrapped the text of the field to improve readability. Some
interesting examples are presented here.
Content-Type: multipart/mixed; boundary="message
-separator" Content-Type: multipart/mixed;
boundary="abcdefghijklmno: boundary:fixed01"
The first case is debateably correct input, although few MTA/UAs will
be able to reconstruct the correct separator. The second case is
illegal, ambiguous and awkward to treat well.
Why do people do this! The road to hell is paved with good
intentions. In both cases little should be done to try and
reconstruct the message without human help.
7.5. MIME prologue and Epilogue text
A number of systems and hand constructed messages put text into the
prologue and epilogue of MIME multipart messages. Whilst this is a
neat trick for allowing non-mime UAs to inform the user why the
message appears as garbage, the prologue/epilogue does not really
Onions Expires Sep 30, 1995 [Page 11]
INTERNET DRAFT How to be a Bad EMail Citizen March 27, 1995
exist as part of a message. Therefore when gatewaying or simply
processing such messages, these components may disappear.
Alternatively they may appear as new body parts after transformation.
Therefore whilst you can do it, don't be surprised if it fails to
appear at the other end.
7.6. MIME and UUENCODE
Whilst a type and/or content-transfer-encoding for uuencoded
documents could be defined, there is currently not one. Therefore
uuencoded documents are not on their own valid MIME messages. They
use to be one of the better ways to transfer documents before the
advent of MIME, but now there are better mechanisms. Uuencoded
contents are known to suffer from problems at gateways.
8. Acknowledgements
This document represents a collection of the experiences and hard-won
battle scars from a large community of people. All implementors of
SMTP mail systems will have had some influence on this document.
In particular there are a number of points taken from the work done
in the smtp extensions working group. This document is a summary of
some of the discussions, and other experiences. Some of this text is
taken from an earlier draft of the SMTP working group document.
The following people made useful comments on an earlier version of
this document:
Patrik Faltstrom, Tomas Kullman, Peter Sylvester, Harald Alvestrand,
Steve Dorner
9. Security Considerations
Security considerations are not discussed in this memo.
10. Editor's Address
Julian Onions <j(_dot_)onions(_at_)nexor(_dot_)co(_dot_)uk> Nexor Ltd. PO
Box 132,
Nottingham, England.
Onions Expires Sep 30, 1995 [Page 12]