Re: 2821bis/ter and procedures (was: Re: retry question)

On Fri, 8 Aug 2008, ned+ietf-smtp(_at_)mrochek(_dot_)com wrote:


I should also point out that if there are no valid recipients the
client probably sohuld not have bothered to send the message.

The server should probably have rejected the DATA command.


Absolutely, if a rejection was possible at that point. OTOH, if the
554 was a result of content analysis, then that would not be possible.

My take FWIW, is that none should be retried because the data is
rejected (probably based on content analysis e.g. spam or
virus). If the data response were a 450, a+b+c should be retried.


Yep, that's my (revised) assessment as well.

John's example is actually a perfect illustration of the context for my
original question. The server in question was using 450 replies to RCPT in
order to get separate deliveries of the same message for recipients with
different content filter settings.


OK, I have to admit, that's not a case I've heard of before, and it makes this
quite a bit trickier than I first thought: A server deliberately using
temporary failures in order to force the client to split the message up.
(Recipient limits in general can have this effect, but this is not normally the
intent of such settings.) As long the the number of possible content filter
settings is small (2) this actually works acceptably well, but the fact that N
separate policies among the recipients of a given message will require N-1
retries makes this impractical in situations with lots of possible filter
settings.

In any case, I don't think there's anything that prohibits this, and given how
quite a few popular filters are incapable of rendering multiple per-recipient
verdicts simultaneously, this is bound to be something that people are going to
do even if it does cause lots of retries. And that means it really does need to
be accomodated. And that in turn means that <c> really should be retried.

Thinking about it even further, I believe a similar case is also going to come
up in some situations where the new Sieve ereject construct is used and there
are other, over-quota, recipients getting temporary errors. (So much for
backscatter elimnation making life easier.)

These two use-cases seems likely enough to me to occur in practice that I
tested our client just now to make sure it does the right thing. And lo and
behold it retries <c> in this case. I no longer recall why it was done this
way, but I'm now glad it was.

That is,

  MAIL FROM:<a>
  250 ok
  RCPT TO:<b>
  250 ok

The server notes that user B has aggressive content filtering settings.

  RCPT TO:<c>
  450 try later

The server sees that user C has lenient content filtering settings and
asks the client to try delivering the message to C later.


Because it wants an additional copy for separate evaluation. Yep, got it.

  RCPT TO:<d>
  550 no such user
  DATA
  354 go ahead
  blah blah
  .
  554 ugh

The message fails to pass B's filter settings, but it would have passed
C's settings. The server expects the client to interpret the post-data
rejection as a failure for recipient A only.


I believe you meant to say recipient <b>. <a> is the sender.

I don't recommend this kind of setup because I get the heebie-jeebies when
I contemplate the possible interoperability problems. Richard's and Ned's
comments above confirm my fears.


Well, in this case at least my client code got it right even though I got it
wrong.

There are other places in which retry logic is underspecified, e.g. should
a client retry other MXs immediately or after a cooling-off period?


I agree it's underspecified, but at least in this case you're not talking about
legitimate server behavior causing unnecessary bounces. MX retry logic is
endlessly tunable and quite a lot of it ends up making one thing better
at the expense of another.

Does
the answer depend on whether there's one MX records with multiple A
records, or multiple MX records at the same priority, or multiple MX
records with differing priorities? Implementers have come up with
different answers.


Indeed. And unfortunately with the ever-increasing prevalence of various
greylisting and limiting schemes, we're at a point where there probably  is no
single "best" answer.

So I think that it would be useful to pin this down more precisely, both
for full clients and queueless clients. (Should message submission clients
handle errors like queueless clients even when they have queues?) However
I'm not very confident that we could get any useful consensus.


In the case of error returns in the protocol, I would hope that some amount of
consensus is possible. In the case of MX retry strategies, I share your doubts.

                                Ned