Re: New issue (was Re: rfc2821bis-01 Issue 18: Usability of 1yz replies)

--On Monday, 23 April, 2007 09:03 -0700
ned+ietf-smtp(_at_)mrochek(_dot_)com wrote:

John C Klensin wrote:

The strawman proposed text would be a provision that a
client SHOULD (or perhaps even MUST) send a RSET or QUIT in
response to a code whose first digit is undefined by either
2821bis or a negotiated extension.

I would agree with that.  Something like:

If an SMTP client receives a reply code that is not defined
by this RFC or a negotiated extension, it SHOULD send a RSET
or QUIT command, and SHOULD treat the unknown reply code as a
5xx code.


I'm with you up to the 5yz code. I think 4yz would be more
appropriate - that way you don't bounce messages unnecessarily
when dealing with a broken server.

Hmm.  Do you want to requeue it and keep trying for four or five
days on the theory that the server might spontaneously fix
itself?


Believe it or not, servers do spontaneously "fix" themselves on a regular
basis.  One reason for this is that bugs tend to lurk in infrequently used
code. Error handling code tends to be infrequently used, so when some temporary
failure occurs (e.g., the directory server is rebooted and is unavailable for
some period of time) the SMTP server finds itself using some buggy code that
spits out incomprehensible garbage instead of a proper message. Then the
temporary problem clears and the server no longer engages the buggy code,
"fixing" the problem (until the next time, at least).

Of course there are also permanent server failures. But these actually tend to
be rarer since a server that pukes out incomprehensible garbage on a regular
basis tends to be found and fixed because mail isn't getting through.

Or would it be better to bounce the puppy and hope that
someone will complain? I think I can argue that one either
way.


You certainly can argue that in such cases it would be better to bounce and in
the process report the problem, but the issue with that is you're reporting the
problem to the end user, who is far more likely to  "blame the messenger" for
the problem than they are to ferret out the actual culprit. And it is hardly
surprising that they do this: How many users understand the structure of email
to the point where they can parse a DSN and determine where the problem was?
(The absolute crap that often appears in the "human readable" part of some DSNs
doesn't help matters any...)

So now you have an end user who likely assumes that the problem is with their
mail service. Most of them won't bother to report the problem, which means that
all implementing a bounce policy has done is tarnish your own reputation with
this user and get nothing fixed in the process. And then there's the occasional
user who will actually report the problem - to their service provider, who
these days is unlikely to have either the desire or the time to get in touch
with the admin of the broken machine and work out the problem. So that doesn't
help much either.

Compare this with what happens when the message is retained and retried. For
one thing, if there's another MX trying it immediately may get the message
delivered. Alternately, if the problem goes away fairly soon the message will
make it through, maybe delayed a bit but not so much that a complaint is
likely.

If the outage lasts for any length of time the user will typically start
getting delay DSNs. But the psychology of a delay DSN is different: It
basically says that we're working to deliver your mail but there's a problem.
The indication of effort, as opposed to "we gave up", especially when it
happens fairly quickly, can sometimes be enough to change a negative user
experience into a positive one.

Even more important, decent service providers monitor their queues fairly
carefully. They do NOT monitor failed DSNs in a similar fashion. And while they
aren't going to care about two messages stuck trying to be sent to the ISP in
Greater Tuna (down no doubt because Dixie Deebury cut off their power ;-), if
otherbigisp.com that receives significant traffic is down they'll check into
and figure out why thousands of messages to otherbigisp.com are stuck in their
queues.

                                Ned