Re: SMTP traffic control2011-10-28 15:57:43On 2011-10-28 19:28:05 +0000, Rosenwald, Jordan wrote: Perhaps I missed something (this has been a long thread), but I'm completely missing how this will solve the problem of long, unpredictable delays for users. Everything I've read says these are return codes for server consumption, not to be returned to users. You mean client, not server, but otherwise you are correct: They are for communication between MTAs, not between an MTA and a user. Currently, when a client gets a 4xx response code, it has no idea when to retry. RFC 5321 gives some advice: The sender MUST delay retrying a particular destination after one attempt has failed. In general, the retry interval SHOULD be at least 30 minutes; however, more sophisticated and variable strategies will be beneficial when the SMTP client can determine the reason for non-delivery. Retries continue until the message is transmitted or the sender gives up; the give-up time generally needs to be at least 4-5 days. It MAY be appropriate to set a shorter maximum number of retries for non- delivery notifications and equivalent error messages than for standard messages. The parameters to the retry algorithm MUST be configurable. Half an hour is ok if the reason for the temporary failure is a problem which requires human intervention. It is already questionable if the reason is that server is too busy (in my experience load spikes usually last only a few minutes, but YMMV). It is definitely too long for greylisting: Most servers which use greylisting use a much shorter initial blocking time - a few minutes seems to be normal, I've even seen times as short as 5 seconds. So, the client could successfully retry after a few minutes, but it doesn't know that, and heeding the advice in the RFC, waits for half an hour. Of course people already noticed that and started to ignore the RFC in this regard: Today MTAs often are configured for much shorter (initial) retry times. This helps, but it is rather crude, a bit wasteful, and you risk being blocked for "excessive number of connections" if you overdo it. So the mail may still sit in the queue for a relatively long and variable time (because that depends on the server configuration, the clients initial delay, the clients backoff algorithm, the schedule of queue runs and the precise moment when the mail was queued). In contrast, consider that case that Hectors server (which uses an initial blocking time of 55 seconds, as he wrote) can tell the client "try again in 56 seconds" and that the client can do this. Then the greylisted mail will get through after 56 seconds (+ probably a bit of jitter added by the queue runner), and Hectors users will be happy because they get the information they requested on the phone while they are still talking instead of half an hour later. It also allows mail admins to keep the default retry interval at a relatively high value and avoid pounding servers which are already overloaded. (It must be emphasized that this only works if both the server and the client use the protocol extension. If the server doesn't provide the retry hint or the client doesn't use it, the situation is just as it is now. So to have any noticable effect it needs to be implemented by at least some popular MTAs) I also see some applications outside of greylisting, too: For example, I've occassionally have to return a 4xx error to some recipients of a multi-recipient message, because they couldn't be processed together in the same transaction. In this case it would be nice if I could tell the client that it is ok to immediately retry for those recipients. Another reason for unexpected mail delays is if you try to open more connections than the receiving MX allows. The mails on some connections get rejected with a temporary error and go back into the queue (possibly several times). They could almost certainly be sent immediately over one of the existing connections, but the client doesn't know that. If the server could convey that information to the client, a wasteful delay could be avoided. As best I can tell this proposed idea does nothing for the end user. It can reduce the delays in case of 4xx errors. End users do care about that (not always, maybe not even most of the time, but often enough). hp -- _ | Peter J. Holzer | Web 2.0 könnte man also auch übersetzen als |_|_) | Sysadmin WSR | "Netz der kleinen Geister". | | | hjp(_at_)hjp(_dot_)at | __/ | http://www.hjp.at/ | -- Oliver Cromm in desd
|
|