[Top] [All Lists]

Re: 2821bis consideration - New 2nd attempt Retry Strategy recommendation

2007-11-17 01:42:29

Hi Hector,
At 17:01 16-11-2007, Hector Santos wrote:
Please note that I am not disagreeing with your points. With skepticism, upon customer "wish list" request and the tracking of how that request did not die, I explored GL and found it to be "do-able." With the fine tuning to minimize impact, it can work without disrupting operations.

We sometimes have to agree to disagree. :-)

Once GL was part of the picture, it was fairly obvious now why operators were previously reporting these strange rejects with no explanation and confused observation of the eventual delivery. Hence the variable table was added, not just for the GL operators but for those who were hitting GL systems.

A sender not familiar with Greylisting may find the rejects strange at first and might blame the software.

As for turning on the GL feature for our own support system, that was a even tougher decision since as a small company, we can't afford any missed sales or customer support emails. But it was turned on and carefully followed. In fact, since there was unsureness of false positives, rather than reject at RCPT TO as I believe many do, we implemented it as part of the DATA filter hook system with a dynamic response at that point.

I use Greylisting in the DATA phase. Some people prefer it to be done at RCPT TO to reduce processing overhead.

Anyway, with a web-base GL tool, this gave the operator an easy way to view stats and check all current GL 1st rejects message content to help give them (and us) confidence

Capturing content is not workable for us due to the additional disk space required. It can be turned on for debugging. I have encountered problems with Greylisting. I came across odd cases. I adapted to them as telling the receiver that it's the sender's fault won't solve the problem.

The 5 mins was carefully decided upon, mainly because I don't particular like the idea of going against 2821 recommendations. But the market overrule that issue. In the end, our default variable table is:


Note: attempt1 is really the 2nd attempt, since the rescheduling code is based off the current count, "msgQueue->nTotalAttempts"

Note that the following is merely a comment. You are better placed than me to decide what's best for your environment.

You are doing more than three retries over an hour period. That's quite high in my opinion. Your backoff period could be incremental. I see that you are using five minutes from attempt 22. Once you go over the first two attempts, you can move from 30 minutes, one hour, then two hours etc. You should take into account the impact frequent reruns can have if the queue gets quite large. Also, consider the receiving hosts. If the SMTP connection to that host failed, I would not retry the other messages addressed to that host within the same time frame.

Finally, on the GL receiver side, our default is a 55 second block and a 2 day grace period to send the retry.

I have a default block of over five minutes which means that your retry won't get through on the second attempt. But then, not all connections are greylisted.

I probably should of use, 3 days since our original defaults (non-variable) was once per hour, 72 attempts or 3 days. And if you follow 2821 recommendation, it suggests 4-5 days. With 30 mins intervals it yields an awful amount of 240-300 retries.

It's only a lot of retries over four or five days if you retry too frequently. Some view the period as too long. A two-day give-up period, for example, may result in delivery issues being missed if it falls over a weekend.

Finally, for the 451 code itself, yeah, I didn't think it was ideal, but I do think that given all our choices, the GL author made the right decision. Assuming the author is an operator mostly, reading RFC 2821, he sees three examples of 45x with literals:

I commented on that in my previous email.

But it should not matter from an SMTP technical standpoint because the SMTP sender must use 45z for its retry considerations, regardless of what z is.


I will say, that I did consider using 451 as a trigger for the altered shorter 2nd attempt interval. But our outbound mail code a 45x response and I didn't want to change for reasons that it might not be 451 but 450, 452 or some other 45x value.

The 451 reply code is used for other conditions as well.

<Prev in Thread] Current Thread [Next in Thread>