Re: 2821bis consideration - New 2nd attempt Retry Strategy recommendat


Hi SM,

Please note that I am not disagreeing with your points. With skepticism,upon customer "wish list" request and the tracking of how that requestdid not die, I explored GL and found it to be "do-able." With the finetuning to minimize impact, it can work without disrupting operations.

Once GL was part of the picture, it was fairly obvious now why operatorswere previously reporting these strange rejects with no explanation andconfused observation of the eventual delivery. Hence the variable tablewas added, not just for the GL operators but for those who were hittingGL systems.

As for turning on the GL feature for our own support system, that was aeven tougher decision since as a small company, we can't afford anymissed sales or customer support emails. But it was turned on andcarefully followed. In fact, since there was unsureness of falsepositives, rather than reject at RCPT TO as I believe many do, weimplemented it as part of the DATA filter hook system with a dynamicresponse at that point.

That allowed us to store a copy of the message for review to see howeffective it was and/or more importantly, to see if "good messages" werelost due to the "good sender" not retrying again.

I can tell ya that the latter was a non-issue and that was sold me onthis GL concept. If there was even a small percentage showing that"good intention" systems had broken SMTP retry logic, odds are veryhigh, I would have nixed this project and explained to our customerswhy. This is not to say there were individual incidences where a "goodintention" message system did not retry. But that soon became a funnymoral reason for supporters to yell at those: "FIX YOUR SMTP SOFTWARE -YOU ARE ACTING LIKE A SPAMMER." If I recall, this was mostly an issuewith systems old PHP scripts with one shot mail send or notificationlogic, but were failing not a GL, but with not properly handlingmultiple response lines. So in most cases, it wasn't GL itself, but someother reason, but they looked at GL as the reason.

Anyway, with a web-base GL tool, this gave the operator an easy way toview stats and check all current GL 1st rejects message content to helpgive them (and us) confidence of this obscure idea working or not.This helped sell it. Its funny I should note, remember, these areoperators, early on some suggested that we add a click button to movethe current message into the accepted mail inbound quue for import.But I explain, thats would be a good idea if we saw good systems notretrying. I think today, they are convince of that. Just let it run andforget about it. Don't see there looking at the web GL stats and rejectstable listings and begin to doubt if a partiticulr new mail that looksgood would eventually come in again and get delivery. Guaranteed! Itwill drive you nuts. :)

The 5 mins was carefully decided upon, mainly because I don't particularlike the idea of going against 2821 recommendations. But the marketoverrule that issue. In the end, our default variable table is:


[Attempts]
Default=60
Attempt1=5
Attempt2=5
Attempt3=15
Attempt5=30
Attempt10=120
Attempt21=5
Attempt22=5
Attempt23=15
Attempt25=30
Attempt30=120
Attempt40=60
Attempt72=60

Note: attempt1 is really the 2nd attempt, since the rescheduling code isbased off the current count, "msgQueue->nTotalAttempts"

Finally, on the GL receiver side, our default is a 55 second block and a2 day grace period to send the retry.

I probably should of use, 3 days since our original defaults(non-variable) was once per hour, 72 attempts or 3 days. And if youfollow 2821 recommendation, it suggests 4-5 days. With 30 mins intervalsit yields an awful amount of 240-300 retries.

But I can't recall off hand the reason two days was selected for thedefault GL grace period. Maybe I was thinking that if spammers wereusing the RFC as a guideline or the GL specs of 4 days, then all theyhad to do is wait 3 days to retry.

Finally, for the 451 code itself, yeah, I didn't think it was ideal, butI do think that given all our choices, the GL author made the rightdecision. Assuming the author is an operator mostly, reading RFC 2821,he sees three examples of 45x with literals:


      450 Requested mail action not taken: mailbox unavailable
         (e.g., mailbox busy)
      451 Requested action aborted: local error in processing
      452 Requested action not taken: insufficient system storage

With the possible erroneous presumption the literals are set in stonefor the reject reason, then among the three, 451 is arguably preferredover 450 and 452.

But it should not matter from an SMTP technical standpoint because theSMTP sender must use 45z for its retry considerations, regardless ofwhat z is.

I will say, that I did consider using 451 as a trigger for the alteredshorter 2nd attempt interval. But our outbound mail code a 45x responseand I didn't want to change for reasons that it might not be 451 but450, 452 or some other 45x value.



--
Sincerely

Hector Santos, CTO
http://www.santronics.com
http://santronics.blogspot.com


Hector Santos wrote:

SM wrote:
the Greylist specs shows  a 4.7.1 extended code:

      451 4.7.1 Please try again later
      http://projects.puremagic.com/greylisting/whitepaper.html
I believe that the reply code mentioned in that whitepaper isincorrect. The extended code is correct. I recommend using "4504.7.1 Text" when the temporary failure is due to a policy decision.
Incorrect in what way? Inappropriate perhaps from a "operator/policy"statement? Functionally or Technically? Compatibility? If its acompatibility problem, then it needs to be reconsidered.
As a general rule, I would use 30 minutes as receivers reading RFC2821 will expect that.
Sure, but all receivers need to be ready for anything, including thepossibility of "more sophisticated and variable strategies" as it wasinsightfully stated in 2821. :-)
So I don't think its would be a technical problem.

Re: 2821bis consideration - New 2nd attempt Retry Strategy recommendation