ietf-asrg
[Top] [All Lists]

Re: [Asrg] 0. General - anti-harvesting (was Inquiry about CallerID Verification)

2003-11-30 22:09:20

----- Original Message ----- 
From: "Yakov Shafranovich" <research(_at_)solidmatrix(_dot_)com>
To: "ASRG" <asrg(_at_)ietf(_dot_)org>

Whether the RFCs require a valid return address, or not, in spirit or in
letter of the law, is something Dave Crocker and Eric Raymond, and
others, who worked on the original 821 and 2821 RFCs can tell us. But
the fact today is that no one is expected to provide a valid address,
and any system relying on this, will fail in some cases unless the
existing RFCs are changed

Yakov,

The scattered RFC concepts related to the Return Path (RP) are very clear
about this.  In practice, all commercial vendors and legitimate mail sites
expect it to be valid because a high percentage of the mail
client/server/gateway software design is 100% purely based on its validity.

That has NOTHING to do with the fact that it could be invalid.  Yet, the
software is designed to handle this possibility as well.  In this case,  the
mail is dropped, logged, or whatever.    The design assumption is that it is
valid with a no brainer consideration that it it quite possible the
destination is not reachable.

This is just like snail mail.  An invalid return address on the paper
envelope stamped with "Return To sender" with a silly finger pointing to the
return address, will fail.  It happens.  But that does not mean, since it is
possible to be invalid,  it is therefore not required to be valid.  No,  it
is naturally assumed to be valid.

From a mail transport system standpoint, this is vitally important.

With delay validation or mail processing such as as gateway, i.e,  UUCP/SLIP
operations, etc,  in order for to properly support gateway and UUCP
operations,  the SMTP receiver must conform with the RFC by added the
"Return-Path:" line to the TOP of the DATA received for final destination
targets.

      RFC 1123 - Host Requirements

      5.2.8  DATA Command: RFC-821 Section 4.1.1
      ..
         When the receiver-SMTP makes "final delivery" of a message,
         then it MUST pass the MAIL FROM: address from the SMTP envelope
         with the message, for use if an error notification message must
         be sent later (see Section 5.3.3).  There is an analogous
         requirement when gatewaying from the Internet into a different
         mail environment; see Section 5.3.7.

If you don't do this,  your can break gateway operations, including  email
list servers, especially when there is a route involved.

Now the compliant GATEWAY has the support the "Return-Path:" header line in
order to provide a proper bounce path.   The logic in our gateway software
using this priority order to get the return path:

           RETURN PATH:
           SENDER:
           [ERROR-TO:]        (Mostly supported by ListServ, Wildcat)
           Reply-To:
           FROM:

This follows RFC 822:

           4.4.4.  AUTOMATIC USE OF FROM / SENDER / REPLY-TO

However, the legacy RFC 822 does not mention Return-Path.  This is covered
by RFC 1123, 5.2.8.

Please note:  that obtaining the Sender, From, etc, means that a MTA must
reach the DATA stage which again, in my view, is something you wish to AVOID
to address illegal access as a preliminary access check.

Again, the point is, it is EXPECTED to be there, it is expected to be valid.

Unreachable Return Path is a difference subject.  However, it is the subject
that the RFC needs to address in what is to be expected about a unreachable
return path.

In practice, it is handled as a BOUNCE concept and if a system fails to
deliver the BOUNCE, it is automatically blacklisted (at least in our
software it is).

Since a BOUNCE is a natural requirement to a mail server, therefore the
return path is expected to be valid, with logic to handle the failures.

We understand that changes must be made. However, we need to justify
these changes before imposing them on the entire Internet. For example,
it is significantly more lightweight to verify domain/IP association via
LMAP than do an RCPT TO callback.  Both your proposal and LMAP address
the same problem - forgery of the MAIL FROM address, except LMAP focuses
on verifying the domain, while you are verifying the actual address.
What we need to determine, is why should we go through the burden of
verifying the actual address, when for the purposes of reducing forgery,
verifying the domain is sufficient?

OK. thanks for clarification.

But as I indicated above,  current commercial implementations of mail
systems (clients, servers, gateways) in the market place has made it a
pseudo-standard and requirement that a valid return path is provided.
Otherwise, we can't work together.

Unreachable return paths are part of the design, handled differently with
each system.  I can speak for our system that we will retry a system using
our default values of 3 days, once per hour (72 attempts).   I can also tell
you that we have logic that puts a weight on the type of return path
failure. For example:

        Default Retry      =  72
        Failed MX          = subtract 10
        Failed IP            = subtract 5
        Failed Server    = subtract 2

with the idea to accelerate the bounce (when retry <= 0) depending on where
it failed.

So if a MX is bad, using the above numbers:

        hour 1          retry count 72
        hour 2          retry count 62
        hour 3          retry count 52

etc, after 8 hours, the bounce will occur.

However, in practice, this was too tight, so it was relaxed (weight are
zero). It will try 72 times.  The exception to the rule is if a server
receiving the mail responds with a code of 55x, which means don't try again.

I am only telling you this because what you need to do look at is what
"Unreachable" means and put it in writing so that developers will have a
guideline to put constraints on a unrearchable return path.  Look at he
CAN-SPAM bill.  It does require a valid return path. So this should be
enough to put it in the specs.

The specs can say something that reflects the the kind of possible failures
when an attempt is made to connect to a MTA and deliver mail and that is
outlined by the SMTP state machine.

Possible failure points:

        connect
        HELO/EHLO
        MAIL FROM:
        RCPT TO:
        DATA
        QUIT

when you lock into the specs that a DOMAIN and a ADDRESS will be expected to
be valid, then you inherently and immediately begin to address a high
percentage of the spammer problem.  It will solve it 100%, but it will make
the spammers, well, honest spammers and thats a MAJOR benefit in the efforts
to control and regulate them (which I believe is NOT part of the functional
design for SMTP).  Thats a installation desire or an end-user +desire.  The
VENDORs will now be able to offer the features the customers and end users
want.

Let's say you have verified either the domain or the address, and the
message in question turns out to be spam. In both cases, you are going
to complain to the ISP of the domain, not the actual user! So why go
through the trouble of verifying the actual email address, when a domain
is sufficient?

Like I said, above, that is none of your business.  You are complicating
what is otherwise a technical problem with a very easy technical solution.
You just need to make it possible.

By attempting to get into "interpretation" of the data, then you are going
into a fuzzy logic and that's not the purpose of SMTP.

In my technical opinion,  you have an fantastic opportunity to address to a
very large degree the anonymous abuse of the servers by simply address the
dearth of controls in the specifications.   Forget about spam. It could
something else together.  Maybe you don't want EXE files or VBS files send
because the customer is placing scrutiny on this type of data.  In all
cases, you need to control and authenticate access that is independent of
the mail content.

With that said, that does not mean the SMTP server should not offer some
level of control here for "data interpretation."     Like I said before, I
see only some sort of CRC32 checker maybe to be added, but that's it.

Compliant clients can say:

            C:  DATA CRC=1BCD233
            S:  354 Begin sending data

or with a fallback:

            C:  DATA CRC=1BCD233
            S:  500 Command not understood
            C:  DATA
            S:  354 Begin sending data

The compliant server can not use the CRC32 to make sure the data integrity
is ok.   We can extend this to address main content controls.

            C:  DATA CRC=1BCD233 HAS=ADV,ATTS
            S:  452 Sorry,  ADV not accepted,  Attachments not accepted

or a positive warning:

            S:  354  Begin sending data.  Warning:  ADV not accepted,
Attachments not accepted

In our design, we offer the HOOK into the DATA stage. That makes a customer
implementation leaving it UP to them to define what is SPAM or not.

Anyway,  not reaching the DATA state is the goal I think the effort should
concentrate on.

The benefit by defining the new controls in SMTP,  is that we can now FEED
traceable information to the customer to allow them to do what they want
with it.

---
Hector Santos, CTO
WINSERVER "Wildcat! Interactive Net Server"
support: http://www.winserver.com
sales: http://www.santronics.com



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>