Re: 2821bis consideration - New 2nd attempt Retry Strategy recommendation
At 17:01 16-11-2007, Hector Santos wrote:
Please note that I am not disagreeing with your points. With
skepticism, upon customer "wish list" request and the tracking of
how that request did not die, I explored GL and found it to be
"do-able." With the fine tuning to minimize impact, it can work
without disrupting operations.
We sometimes have to agree to disagree. :-)
Once GL was part of the picture, it was fairly obvious now why
operators were previously reporting these strange rejects with no
explanation and confused observation of the eventual
delivery. Hence the variable table was added, not just for the GL
operators but for those who were hitting GL systems.
A sender not familiar with Greylisting may find the rejects strange
at first and might blame the software.
As for turning on the GL feature for our own support system, that
was a even tougher decision since as a small company, we can't
afford any missed sales or customer support emails. But it was
turned on and carefully followed. In fact, since there was
unsureness of false positives, rather than reject at RCPT TO as I
believe many do, we implemented it as part of the DATA filter hook
system with a dynamic response at that point.
I use Greylisting in the DATA phase. Some people prefer it to be
done at RCPT TO to reduce processing overhead.
Anyway, with a web-base GL tool, this gave the operator an easy way
to view stats and check all current GL 1st rejects message content
to help give them (and us) confidence
Capturing content is not workable for us due to the additional disk
space required. It can be turned on for debugging. I have
encountered problems with Greylisting. I came across odd cases. I
adapted to them as telling the receiver that it's the sender's fault
won't solve the problem.
The 5 mins was carefully decided upon, mainly because I don't
particular like the idea of going against 2821 recommendations. But
the market overrule that issue. In the end, our default variable table is:
Note: attempt1 is really the 2nd attempt, since the rescheduling
code is based off the current count, "msgQueue->nTotalAttempts"
Note that the following is merely a comment. You are better placed
than me to decide what's best for your environment.
You are doing more than three retries over an hour period. That's
quite high in my opinion. Your backoff period could be
incremental. I see that you are using five minutes from attempt
22. Once you go over the first two attempts, you can move from 30
minutes, one hour, then two hours etc. You should take into account
the impact frequent reruns can have if the queue gets quite
large. Also, consider the receiving hosts. If the SMTP connection
to that host failed, I would not retry the other messages addressed
to that host within the same time frame.
Finally, on the GL receiver side, our default is a 55 second block
and a 2 day grace period to send the retry.
I have a default block of over five minutes which means that your
retry won't get through on the second attempt. But then, not all
connections are greylisted.
I probably should of use, 3 days since our original defaults
(non-variable) was once per hour, 72 attempts or 3 days. And if you
follow 2821 recommendation, it suggests 4-5 days. With 30 mins
intervals it yields an awful amount of 240-300 retries.
It's only a lot of retries over four or five days if you retry too
frequently. Some view the period as too long. A two-day give-up
period, for example, may result in delivery issues being missed if it
falls over a weekend.
Finally, for the 451 code itself, yeah, I didn't think it was ideal,
but I do think that given all our choices, the GL author made the
right decision. Assuming the author is an operator mostly, reading
RFC 2821, he sees three examples of 45x with literals:
I commented on that in my previous email.
But it should not matter from an SMTP technical standpoint because
the SMTP sender must use 45z for its retry considerations,
regardless of what z is.
I will say, that I did consider using 451 as a trigger for the
altered shorter 2nd attempt interval. But our outbound mail code a
45x response and I didn't want to change for reasons that it might
not be 451 but 450, 452 or some other 45x value.
The 451 reply code is used for other conditions as well.