Re: Internet draft draft-onions-822-mailproblems-00.txt

The draft claims to be informational, but purports to make
requirements of implementations.

Many of the issues in the draft (like a restatement of the
prohibitions aganist just-send-8) are covered in the upcoming 822 and
SMTP Applicability statements.  The author should look at
and comment on draft-ietf-mailext-smtpas-00.txt.


I agree 100%.

On some more specific points:

5.6. Rejection of SMTP connections due to DNS failure.

   There are a number of SMTP implementations that either do, or can  be
   configured,  to  reject  SMTP  connections if the calling host is not
   registered in the DNS. This is seen by some  as  a  breaking  of  the
   spirit  of  RFC-1123, and by others as a useful get-out-of-jail card.

I'd say this most certainly breaks the spirit, if not the wording of
1123 section 5.2.5.  The fact that such servers decide to refuse to
accept a message before the client can send a HELO command is a mere
technicality.

In any case, there is no reason for SMTP clients to pander
specifically to such sites.  The servers are the ones who decided they
don't want to accept mail.


Agreed.

   Implementors are therefore
   encouraged to use back up MX routing in the case of a connection that
   succeeds but no data is received before the connection is dropped.

This is a good idea outside of the context of servers that try to do
reverse-name lookups.  One should roll over to the next MX if a
connection fails, is dropped unexpectedly, ore one gets a 4XX reply
code on MAIL, DATA, or the final '.'.


I'm not sure I agree with this. Care must be taken to insure that messages
don't loop endlessly or bounce. I've seen too many cases of broken hosts
performing front-end MX services that don't correctly implement the destination
pruning according to RFC974. The usual symptom is that things work just fine
until the final destination host is down, at which point the MX either:

(1) Sends the mail to itself over and over again until the hop count is
    exceeded and the message is bounced.
(2) Sends the mail to itself through an intermediate that sufficiently
    damages the message to cause exponential growth in message size,
    resource exhaustion, or both.
(3) Simply bounces the mail.

Such problems can remain dormant for years before some type of outage
brings them to light. This proposal has the effect of increasing the
number of cases where fallback routing will be used, which will inevitably
result in an increase in this sort of behavior.

In the past I might have argued that these configurations are problems that
need to be fixed and the sooner they are found the better, but I'm sick and
tired of fixing these things right now. At the very least any suggestion that
failures within the dialogue be handled by contacting a fallback MX site and
transferring mail needs to be coupled with a clear pointer to the relevant
sections of RFC974 that describe the restrictions on such fallbacks.

There is also the added complexity involved in caching errors. For example,
many clients cache various sorts of failures (hopefully on a temporary basis
only) so that they can avoid the expense of waiting a long time for something
that is almost sure to time out. Coupling this with fallbacks to secondary MXes
can have interesting effects: An error that results from an attempt to send a
very large message the server doesn't have space for right now might get cached
and lead to suboptimal handling of many subsequent short messages the server
could have handled without any problems. This issue must be discussed as
well in any text that recommends such strategies.

One failure mode I have seen is for one of several MX servers to run
short on disk space and return a 452 reply after the final '.'.  Some
SMTP clients, such as older versions of sendmail, don't know to roll
over to the next MX when this happens.


See above. If we're going to run around fixing these old versions to handle
this right we need to take care of their problems handling proper pruning of MX
lists. We also need to think through the implications this has on caching. I
view failure caching as far more important to good SMTP performance that timely
use of fallback servers in the event of a failure mid-dialogue.

   A resilient server MAY detect that the EHLO caused the connection  to

That would be a resilient *client*.

   Alternatively it can be treated as a bad connection and subsequent MX
   records  tried  if  available.

Usually, there won't be a subsequent MX host that does handle EHLO.
In this case, the message will sit in the queue for five days, then
bounce.


First of all, this conclusion is actually incorrect. This is a well known
problem with older versions of Microsoft Mail's SMTP server. There is only one
other server I know of that does it, but given the incredible popularity of
Microsoft Mail its almost always the one at fault in these cases. And due to
its inability to accept more than one connection at a time, not to mention its
penchant for going offline periodically and not accepting any connections,
almost all of these servers end up getting front-ended with a more capable
system via an MX record. (But what if you are the MX front end for 
a Microsoft Mail server?)

It is also even worse than you suppose. When our ESMTP support first came out
we ran into the problem with Microsoft Mail almost immediately -- it closes the
connection when it receives EHLO. OK, we said, we'll try reconnecting and
sending just a HELO in this case. And this does indeed fix the problem some of
the time.

But there was another, less obvious and more dangerous problem. It appears that
in addition to closing the connection illegally, the issuance of the 5xx
response and the close are not properly synchronized. Sometimes you get
the 5xx status, sometimes you don't. And sometimes it gets stuck in some
buffer somewhere, so that the next new connection gets something like this:

   500 Huh?
   250 All set, fire away

Often as not this is somebody else's SMTP client that has snuck in. (Remember
that its a single thread, so heavy loads are pretty common.) And of course they
bounce whatever message they were trying to send. If you try reconnecting
to the server right away you end up getting burned even more badly than
before.

The conclusion is inescapable: The only way to fix this problem is to fix the
broken software. Attempts to be clever can and do backfire.

6.2. Received Lines

   The syntax of the Received: lines in RFC-822 messages  is  reasonably
   straight  forward.  It requires as a minimum a date stamp following a
   semi-colon.  Unfortunately  some  implementations  cannot   seem   to
   generate  this.  This  can  cause  problems  when gatewaying to other
   systems that also have trace fields. This is seen as a  good  way  to
   cause general confusion when tracking messages.

   When gatewaying or examining these  elements,  the  invalid  elements
   should  either be discarded or else the current time inserted to make
   them legal. The illegal Received: lines can be changed  to  be  Orig-
   Received: to ensure the relayed message is now legal.

Relay agents should not muck with the existing contents of the
message, this breaks the message/envelope separation.  They should
most certainly not be mucking with existing Received: headers.


Agreed. I think Received: lines should left alone if at all possible. Who cares
if the syntax is legal or not? The ones with illegal syntax often  contain very
useful information!

Gatewaying is distinct from relaying.  What should happen with
Received: headers when gatewaying depends on what kind of system the
message is being gatewayed to/from.


Frankly, I don't think gateways have any business messing with them either.
There's a whole series of very elaborat rules for mapping Received: headers
in RFC1327. I refuse to implement them because:

(1) They violate GOSIP requirements. (Actually I don't think they do, but they
    definitely do violate some vendors' interpretations of GOSIP.)

(2) Comments are lost. Comments are often the most valuable information in
    a Received: header.

7.1. Badly formatted Content-Type: fields

   Implementations have been known to produce lines of the form

      MIME-version: 1.0
      Content-Type: text

   That is, a MIME type, without the mandatory subtype. This is  illegal
   as   a   MIME  header  and  means  the  content  may  be  subject  to
   misinterpretation.

This is a syntatically illegal MIME Content-Type header.  My MIME
readers ignore syntactically illegal Content-Type headers, causing them
to treat the part as the default "text/plain; charset=US-ASCII".  I've
seen others apply "default subtypes".


Default subtypes are almost certainly an artfact of tracking the MIME
specification, which used to allow default subtypes.

It is a legal RFC 1049 Content-Type header field.


Right. My implementation actually supports RFC 1049 format fields.

7.2. Multiple Content-Type fields

   Messages  may  contain  multiple   Content-Type   fields,

Messages are not permitted to contain multiple Content-Type fields, it
is not the case that composers MAY generate multiple Content-Type
fields.


Right.

It is possible for multiple fields of any type to appear in an invalid
message, be that Content-Type, Content-Transfer-Encoding, From,
Sender, Subject, whatever.  Content-Type is not special in this
regard.


Right. Although care must be taken not to ban multiple content- headers
completely. The FTBP proposal uses them...

7.3. Badly structured multipart messages

   Message that contain fields such as

      Content-Type: multipart/mixed

   have some great potential for causing indigestion  in  mail  systems.

They're certainly illegal.  I have no idea what advice the text is
trying to give on the subject.


Well, the typical problem is that if the multipart body doesn't contain at
least one occurance of the boundary the entire contents are seen as preamble
and completely lost. As such, some people think it might be nice to "recommend"
that this not happen. However, nice as this may sound I think that the only
real solution is to stop producing illegal multipart objects to begin with.
Attempts to cater to such behavior only provide grounds for its continuance.

                                Ned