Re: 2821bis consideration - New 2nd attempt Retry Strategy recommendation
2007-11-16 18:23:29
Hi SM,
Please note that I am not disagreeing with your points. With skepticism,
upon customer "wish list" request and the tracking of how that request
did not die, I explored GL and found it to be "do-able." With the fine
tuning to minimize impact, it can work without disrupting operations.
Once GL was part of the picture, it was fairly obvious now why operators
were previously reporting these strange rejects with no explanation and
confused observation of the eventual delivery. Hence the variable table
was added, not just for the GL operators but for those who were hitting
GL systems.
As for turning on the GL feature for our own support system, that was a
even tougher decision since as a small company, we can't afford any
missed sales or customer support emails. But it was turned on and
carefully followed. In fact, since there was unsureness of false
positives, rather than reject at RCPT TO as I believe many do, we
implemented it as part of the DATA filter hook system with a dynamic
response at that point.
That allowed us to store a copy of the message for review to see how
effective it was and/or more importantly, to see if "good messages" were
lost due to the "good sender" not retrying again.
I can tell ya that the latter was a non-issue and that was sold me on
this GL concept. If there was even a small percentage showing that
"good intention" systems had broken SMTP retry logic, odds are very
high, I would have nixed this project and explained to our customers
why. This is not to say there were individual incidences where a "good
intention" message system did not retry. But that soon became a funny
moral reason for supporters to yell at those: "FIX YOUR SMTP SOFTWARE -
YOU ARE ACTING LIKE A SPAMMER." If I recall, this was mostly an issue
with systems old PHP scripts with one shot mail send or notification
logic, but were failing not a GL, but with not properly handling
multiple response lines. So in most cases, it wasn't GL itself, but some
other reason, but they looked at GL as the reason.
Anyway, with a web-base GL tool, this gave the operator an easy way to
view stats and check all current GL 1st rejects message content to help
give them (and us) confidence of this obscure idea working or not.
This helped sell it. Its funny I should note, remember, these are
operators, early on some suggested that we add a click button to move
the current message into the accepted mail inbound quue for import.
But I explain, thats would be a good idea if we saw good systems not
retrying. I think today, they are convince of that. Just let it run and
forget about it. Don't see there looking at the web GL stats and rejects
table listings and begin to doubt if a partiticulr new mail that looks
good would eventually come in again and get delivery. Guaranteed! It
will drive you nuts. :)
The 5 mins was carefully decided upon, mainly because I don't particular
like the idea of going against 2821 recommendations. But the market
overrule that issue. In the end, our default variable table is:
[Attempts]
Default=60
Attempt1=5
Attempt2=5
Attempt3=15
Attempt5=30
Attempt10=120
Attempt21=5
Attempt22=5
Attempt23=15
Attempt25=30
Attempt30=120
Attempt40=60
Attempt72=60
Note: attempt1 is really the 2nd attempt, since the rescheduling code is
based off the current count, "msgQueue->nTotalAttempts"
Finally, on the GL receiver side, our default is a 55 second block and a
2 day grace period to send the retry.
I probably should of use, 3 days since our original defaults
(non-variable) was once per hour, 72 attempts or 3 days. And if you
follow 2821 recommendation, it suggests 4-5 days. With 30 mins intervals
it yields an awful amount of 240-300 retries.
But I can't recall off hand the reason two days was selected for the
default GL grace period. Maybe I was thinking that if spammers were
using the RFC as a guideline or the GL specs of 4 days, then all they
had to do is wait 3 days to retry.
Finally, for the 451 code itself, yeah, I didn't think it was ideal, but
I do think that given all our choices, the GL author made the right
decision. Assuming the author is an operator mostly, reading RFC 2821,
he sees three examples of 45x with literals:
450 Requested mail action not taken: mailbox unavailable
(e.g., mailbox busy)
451 Requested action aborted: local error in processing
452 Requested action not taken: insufficient system storage
With the possible erroneous presumption the literals are set in stone
for the reject reason, then among the three, 451 is arguably preferred
over 450 and 452.
But it should not matter from an SMTP technical standpoint because the
SMTP sender must use 45z for its retry considerations, regardless of
what z is.
I will say, that I did consider using 451 as a trigger for the altered
shorter 2nd attempt interval. But our outbound mail code a 45x response
and I didn't want to change for reasons that it might not be 451 but
450, 452 or some other 45x value.
--
Sincerely
Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com
Hector Santos wrote:
SM wrote:
the Greylist specs shows a 4.7.1 extended code:
451 4.7.1 Please try again later
http://projects.puremagic.com/greylisting/whitepaper.html
I believe that the reply code mentioned in that whitepaper is
incorrect. The extended code is correct. I recommend using "450
4.7.1 Text" when the temporary failure is due to a policy decision.
Incorrect in what way? Inappropriate perhaps from a "operator/policy"
statement? Functionally or Technically? Compatibility? If its a
compatibility problem, then it needs to be reconsidered.
As a general rule, I would use 30 minutes as receivers reading RFC
2821 will expect that.
Sure, but all receivers need to be ready for anything, including the
possibility of "more sophisticated and variable strategies" as it was
insightfully stated in 2821. :-)
So I don't think its would be a technical problem.
|
|