Hi,
I am wondering if writing a I-D or BCP is worth the effort here and your
comments are welcome.
Basically, with the advent of larger emails and the direction of mail
sophisticated mail receivers performing DATA pre-response callouts to
process the message before determining what the response code will be,
there is a greater potential for client timeout issues, duplicate
resends and messages and of course, wasteful bandwidths and overheads.
Summary of my proposal:
Clients should consider adjusting their DATA termination
state timeout based on the size of the message they are
sizing and just use a low 5 minutes across the board for
all payload mail sizes. If the client is not using the
recommended 10 minute timeout [RFC 2821], it should
consider possible receiver lengthy processing resend and
duplicate message issues are increasingly possible, thus
the client SHOULD adjust the DATA termination timeout
as follows:
Use 5 minutes for 5 megabytes or less.
Use 10 minutes for over 5 megabytes.
Or use some block transfer rate calculation as it proceeds
to determine what the timeout will be when complete.
Overall, Using a constant 5 minutes is TOO low for
large file transfers. It needs to be adjusted.
Background:
The issue is 100% highlighted in 1998 two page RFC 1047 "DUPLICATE
MESSAGES AND SMTP".
INTRODUCTION
....
It may be hard to believe that this problem is the cause of many
duplicate messages. Intuitively, one might expect that the time
spent in the state between the final dot and its accepting 250 reply
is quite small. In practice, however, this period is often quite
long; long enough that timeouts by the sending mailer (or possibly
network failures) are quite common. Observations by the author
suggest that this synchronization problem may be the second leading
cause of duplicate messages on the Internet (second to mail loops).
....
Many mailers delay responding to the final dot because they are doing
sophisticated processing of the message, in an attempt to confirm
that they can deliver the message.
RFC 2821(bis) has a 10 minute recommendation:
DATA Termination: 10 minutes.
This is while awaiting the "250 OK" reply. When the receiver gets
the final period terminating the message data, it typically
performs processing to deliver the message to a user mailbox. A
spurious timeout at this point would be very wasteful and would
typically result in delivery of multiple copies of the message,
since it has been successfully sent and the server has accepted
responsibility for delivery. See section 6.1 for additional
discussion.
The reference to 6.1 states:
To avoid receiving duplicate messages as the result of timeouts, a
receiver-SMTP MUST seek to minimize the time required to respond to
the final <CRLF>.<CRLF> end of data indicator. See RFC 1047 [28] for
a discussion of this problem.
which points to RFC 1047.
We ran into the exact issue highlighted in RFC 1047 where a customer has
a local set of AVS rule based processing policies with lengthy callouts
after the DATA termination is received and before the response code is
sent. In this case, the callouts maxed out 5 minutes because it detects
the client dropped the connection.
However, our SMTP server continues to send the 250 and because it still
some RFC 821 behavior, as RFC 1047 described:
RFC-821 (on page 22) states that unless the receiving mailer is
completely unable to process a message it should accept the message
and acknowledge any errors in processing in a separate message or
messages sent back to the originator of the message. As a result,
receiving mailers should be able to acknowledge the final dot as soon
as the message has been safely put in a non-volatile (e.g., disk)
queue for further processing. Fast acceptance of a message does not
violate RFC-821.
In short, our server issues the 250 and signals the router to process
the mail, and logs the event for the operator to see.
The acceptance of this message appears to violates 2821 (showing 2921bis
which is 99% the same with a few changes):
4.1.1.10. QUIT (QUIT)
...
The receiver MUST NOT intentionally close the transmission channel
until it receives and replies to a QUIT command (even if there was an
error). The sender MUST NOT intentionally close the transmission
channel until it sends a QUIT command and SHOULD wait until it
receives the reply (even if there was an error response to a previous
command). If the connection is closed prematurely due to violations
of the above or system or network failure, the server MUST cancel any
pending transaction, but not undo any previously completed
transaction, and generally MUST act as if the command or transaction
in progress had received a temporary error (i.e., a 4yz response).
The QUIT command may be issued at any time. Any current uncompleted
mail transaction will be aborted.
So we are now debating if this is good or bad.
The customer received the message. His complaint is that the mail
client sending large emails are trying again with the same thing
happening. However, our dupe processor is catching it so it isn't a
problem of getting new duplicate mail, just a processing overhead problem.
If we followed QUIT to the letter in 2821 even after a successful DATA
termination was received, and delay the router processing until a QUIT
is finally issued, otherwise CANCEL the transaction, then the customer
will never receive the message in the first place. And this was exactly
what he expressed when we found out it was not canceling the message and
indicated we might have to fix that:
Customer comment:
Well, unfortunately, your proposed fix just might prevent
ANY large emails rom being received.
I'd rather get "23" of the same email than "0" of them.
I really just want 1, of course.
Anyway, the callout issue can be adjusted per customer but I think there
is a conflict of the standard recommends 10 minutes with "buts" in it.
I can understand older days and many systems used POST SMTP processing
but we all know they are suffering from major blow back problems. So the
direction is to apply DATA level callouts to provide dynamic SMTP level
rejection capabilities. This is a godsend for our customers and no way
will be removed. Better notes will be provided regarding lengthy
callouts taking more than 5 minutes, but I think we should also teach
our SMTP clients to adjust to todays changing times.
Even RFC 1047 concludes that mailers should be aware of this and not
just use a low 5 minutes across the board, especially when sending
larger email payloads which will obviously add processing time at the
today's aware AVS receivers:
Finally, some mailers allow remote mailers only a minute or two to
acknowledge the final dot before timing out and trying again. Given
the increasing round-trip times on the Internet, and that some
processing after the final dot is required, the timeout for reply to
the final dot should probably be at least 5 minutes and a timeout of
10 minutes would not be unreasonable.
I can understand how some may thing "10 minutes" is too long, so the
proposal is to adjust it based on the size of the file being transfered.
Comments?
--
Sincerely
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com