[Top] [All Lists]

Re: SMTP traffic control

2011-10-28 15:57:43
On 2011-10-28 19:28:05 +0000, Rosenwald, Jordan wrote:

Perhaps I missed something (this has been a long thread), but I'm
completely missing how this will solve the problem of long,
unpredictable delays for users. Everything I've read says these are
return codes for server consumption, not to be returned to users. 

You mean client, not server, but otherwise you are correct: They are for
communication between MTAs, not between an MTA and a user.

Currently, when a client gets a 4xx response code, it has no idea when
to retry. RFC 5321 gives some advice:

   The sender MUST delay retrying a particular destination after one
   attempt has failed.  In general, the retry interval SHOULD be at
   least 30 minutes; however, more sophisticated and variable strategies
   will be beneficial when the SMTP client can determine the reason for

   Retries continue until the message is transmitted or the sender gives
   up; the give-up time generally needs to be at least 4-5 days.  It MAY
   be appropriate to set a shorter maximum number of retries for non-
   delivery notifications and equivalent error messages than for
   standard messages.  The parameters to the retry algorithm MUST be

Half an hour is ok if the reason for the temporary failure is a problem
which requires human intervention. It is already questionable if the
reason is that server is too busy (in my experience load spikes usually
last only a few minutes, but YMMV). It is definitely too long for
greylisting: Most servers which use greylisting use a much shorter
initial blocking time - a few minutes seems to be normal, I've even seen
times as short as 5 seconds. So, the client could successfully retry
after a few minutes, but it doesn't know that, and heeding the advice in
the RFC, waits for half an hour. 

Of course people already noticed that and started to ignore the RFC in
this regard: Today MTAs often are configured for much shorter (initial)
retry times. This helps, but it is rather crude, a bit wasteful, and you
risk being blocked for "excessive number of connections" if you overdo
it. So the mail may still sit in the queue for a relatively long and
variable time (because that depends on the server configuration, the
clients initial delay, the clients backoff algorithm, the schedule of
queue runs and the precise moment when the mail was queued).

In contrast, consider that case that Hectors server (which uses an
initial blocking time of 55 seconds, as he wrote) can tell the client
"try again in 56 seconds" and that the client can do this. Then the
greylisted mail will get through after 56 seconds (+ probably a bit of
jitter added by the queue runner), and Hectors users will be happy
because they get the information they requested on the phone while they
are still talking instead of half an hour later. It also allows mail
admins to keep the default retry interval at a relatively high value and
avoid pounding servers which are already overloaded.

(It must be emphasized that this only works if both the server and the
client use the protocol extension. If the server doesn't provide the
retry hint or the client doesn't use it, the situation is just as it is
now. So to have any noticable effect it needs to be implemented by at
least some popular MTAs)

I also see some applications outside of greylisting, too:

For example, I've occassionally have to return a 4xx error to some
recipients of a multi-recipient message, because they couldn't be
processed together in the same transaction. In this case it would be
nice if I could tell the client that it is ok to immediately retry for
those recipients.

Another reason for unexpected mail delays is if you try to open more
connections than the receiving MX allows. The mails on some connections
get rejected with a temporary error and go back into the queue (possibly
several times). They could almost certainly be sent immediately over one
of the existing connections, but the client doesn't know that. If the
server could convey that information to the client, a wasteful delay
could be avoided.

As best I can tell this proposed idea does nothing for the end user. 

It can reduce the delays in case of 4xx errors. End users do care about
that (not always, maybe not even most of the time, but often enough).


   _  | Peter J. Holzer    | Web 2.0 könnte man also auch übersetzen als
|_|_) | Sysadmin WSR       | "Netz der kleinen Geister".
| |   | hjp(_at_)hjp(_dot_)at         | 
__/   | |  -- Oliver Cromm in desd

Attachment: signature.asc
Description: Digital signature

<Prev in Thread] Current Thread [Next in Thread>