ietf-smtp
[Top] [All Lists]

Re: CBV systems - was Re: SMTP Extensions - proper peply code for disabled commands

2004-01-14 14:30:18


----- Original Message ----- 
From: "Keith Moore" <moore(_at_)cs(_dot_)utk(_dot_)edu>
To: "Hector Santos" <winserver(_dot_)support(_at_)winserver(_dot_)com>
Cc: "Keith Moore" <moore(_at_)cs(_dot_)utk(_dot_)edu>; 
<ietf-smtp(_at_)imc(_dot_)org>
Sent: Wednesday, January 14, 2004 9:47 AM
Subject: Re: CBV systems - was Re: SMTP Extensions - proper peply code for
disabled commands



The reason I asked is that I used CBV for many years to filter spam
from my mailing lists, and I often found that (otherwise) legitimate
addresses failed to verify if I used MAIL FROM:<>.  (I was using CBV on
the header From address - on the assumption that if someone on the list
couldn't reply to the message by normal means, then it probably wasn't
useful to send that non-subscriber's comments to the list anyway).

Most of the time the verification would fail in the MAIL FROM response,
but occasionally it would fail in RCPT TO response.  So I ended up
having to send MAIL FROM:<postmaster(_at_)my(_dot_)site> or some such instead 
in
order to get an accurate verification.

(one could argue that you shoudn't accept mail from misconfigured
systems anyway, but bouncing such messages doesn't do anything useful
IMHO because it doesn't report an error that is likely to be understood
by the person receiving the bounced mail.  That and you have to arrange
for the bounces to have a return-path other than <> in order for them
to even get back to the sender.)

I'm not seeng the same level of result you seem to suggest occurs in mass.
Yes, some systems will behave differently when provided a NULL vs non-NULL
Mail From:   but it is a very relative few from what I see.

But the goal of the CBV, at least ours, is to check for a BAD address.  Not
Good Address and by far, most systems are compliant and most systems will
validate a RCPT TO: request.

The good news is that the BAD systems are the ones mostly likely providing
the results we are looking for.   By far, they are spoofing fake addresses
and this is what we are helping to eliminate in this process.

In other words, a CBV is catching the majority of the spoofing spammers.

The bad news is that in the end, they will be legitimate (CAN-SPAM) making a
CBV system a redundancy issue and more of a auditing/tracking system.

Here's what I see happening if this is widely adopted:

A0->B0: MAIL FROM:<my.address>
-B0->A1: HELO
-B0->A1: MAIL FROM:<dummy(_dot_)address(_at_)B>
--A1->B1: HELO
--A1->B1: MAIL FROM:<dummy(_dot_)address(_at_)A>
---B1->A2: HELO
---B1->A2:MAIL FROM:<dummy(_dot_)address(_at_)B>
----A2->B2: HELO
----A2->B2: MAIL FROM:<dummy(_dot_)address(_at_)A>
...

Use a NULL address. That will solve the looping problem with NO change to
the SMTP specification.  Very little spammers use a NULL address because it
reinforces further DSN checking that will fail their transaction.    In our
case,  a NULL will bypass the CBV (but not other checks).

If CBV is widely adopted then we can use some new extensions to help the
process.   But today, you have no choice, but to use a NULL address.
However in our system, it offers session parameter configuration on an
optional per-domain basis.  So if the sysadmin needs to verify a specific
domain (HOTMAIL.COM), then I believe, off hand, this domain requires a
non-NULL address.  I don't remember off-hand if hotmail.com was one of them.

Thus far, there are 3 systems that I know support CBV systems:

       - Exim  (Spelling) ???
       - Verisign.net (or verisign.com??)
       - Wildcat! WCSAP (ours)


I think the more appropriate question is:

What is the RFC 2821 presumption for the validity of the return path?

Or at what point should RFC 2821 presume the return path to be valid?
When
it is provided or when its too late (after the mail is received and
rejected)?

Return-path is intended for (non)delivery notification, not for sender
verification.  It is not reasonable to take return-path as an indicator
of sender identity.

Why not?  I believe the presumption is made in RFC 1123 and in RFC 2821 that
DNS and error reporting is to be returned to a "existing" return path.
Otherwise what is the point of using a Return-Path: header?

I believe this needs to be made very clear.

SMTP servers _should_ return immediate error conditions (in response to
MAIL FROM, RCPT TO, DATA) whenever possible because, for a wide variety
of reasons, immediate reporting is much more reliable and consumes less
resources than bouncing the message.

I agree.   But there are still legacy and legimiate reasons for delaying the
verificaton/validation/checking at RCPT TO:

The problem I see here with some domain (specifically YAHOO.COM) is that
they delay the validating of the recipient until the DATA stage.   So the
harvesting reason may people cite for not validating at the RCPT TO: is a
red herring IMO in the case of YAHOO.COM.

In our case, the CBV does not work well for a YAHOO.COM type system because
it is not reaching the DATA state.  I personally believe YAHOO acceptance of
all recipients contributes the the spamming problem and distribute the cost
to other systems because spoofers will use YAHOO.COM as the return domain
knowing YAHOO.COM will accept it.

But this does not a reason to throw out the baby with the bath water.  The
CBV system works to eliminate  a majority of the spoofing spammers.

For similar reasons, SMTP senders
_should_ minimize the relay path length to the receiver's MX rather
than aggregating traffic through upstream relays.  Of course there are
reasons to have both outbound and inbound relays, but these do come at
a cost.

Again I agree.   Keep in mind that this is will alter the specifications,
hencing making a deployment issue.

Despite my use of CBV on mailing list traffic, I don't think they are
nearly as good an indicator for traffic in general, for lots of
reasons:

- The presumption that traffic with an invalid return-path is invalid,
while perhaps reasonable for some mailing lists, is less reasonable as
a general rule.

Emperically,  I am not finding this to be true.  With over 25 testing sites
processing millions of messages (for our systems an average of 2500+ or so
per day),   +80% of the transactions are rejected due to CBV checking with
100% TRUE negatives (remotes will refuse the return address when the RCPT TO
is issued at their site).

- The verification has lots of overhead, and some potential for looping
if widely adopted.

A technical problem that can be addressed and has been addressed.  A NULL
address is all you need to do.  Fortunatelly most spammers do not use a NULL
address.

- The effectiveness of CBV has decreased considerably over time.  When
I first started using it (not sure when, maybe 7 years ago) it
identified nearly 100% of spam sent to my lists.  When I stopped using
it on a large scale (just over a year ago) it only identified about 25%
of the spam sent to my lists.  It appeared that spammers had realized
that mail from invalid addresses was less successful, so they simply
started sending mail from valid addresses.

Again, today, we are seeing atleast 80% of spammers are spoofers.  I believe
industry research also agrees with these level of numbers and I believe the
main reasin why it is one of the mandates of CAN-SPAM Act; validate return
addresses.

Now, I will say that I expected the same results; with Spammers learning.
Emperically, I have not seen this.  I believe this is because it is MORE in
their interest to remain anonymous than to try issue valid but fraudulent
use of addresses which is a ECPA federal crime.

More ironically, what I have seen is a 60% increase of HELO drops once we
place a POLICY STATEMENT in the greeting with the specific words:

        WARNING:  FOR AUTHORIZED USE ONLY!

I believe this is result of the AOL lawsuit.  So the BIG spammers do look
for these SPECIFIC words and will drop their connections once seen.

I personally couldn't believe my eyes.  I remove the policy statement and
behold, they didn't drop.  Put back, they began to drop again.

But I have not seen them learn the MAIL FROM.

What I have seen is some of them will issue MULTIPLE MAIL FROMS: if
dynamically rejected.   I seen atleast 2-3 attempts here before dropping.


- In my experience, CBV does not provide 100% true negatives - because
of misconfigured systems that reject MAIL FROM:<> and other valid mail.

When I say 100% true negatives, I am talking about 100% rejections at the
remote RCPT TO state.  That is, IMO, a 100% trusted value.

I can give you 6 months worth of millions of trace logs. I will be surprise
to see more than 0.01% of MAIL FROM: <> breaks.  I will try to find this
count for you today.

  (if you have your SMTP server report 4xx whenever the CBV tempfails,
there are other problems - it is not unusual for a CBV to fail because
either the sender's DNS servers or the sender's SMTP server are too
slow or inaccessible, even though the sender's system is sending
outgoing mail.)

From a SMTP compatibility standpoint, this is the one of the main issues.
Fortunately, emperically, what we are seeing is that by far MOST systems are
SMTP compliant, run good systems and the onces the CBV will catch are bad
systems and for the false negatives, the 45x does tell the remote "Hey, try
again"   Fortunately, the bad systems WILL not.  The good systems will.
Thats what I am seeing.

But again, we should not throw out the baby with the bath water.

In short - CBV is expensive for the receiver and easily defeated by
spammers.

By becoming legit and trackable?  I see that as a good thing.

  widely adopted it very quickly becomes useless overhead.

Rather useless redundancy.  Again, a good thing.  Not a bad thing.

I do think we need reliable traceability to the sender, but doing CBV
on return-path is not the way to get it.

Then you have a Deployment problem and you will not be able to do it without
CHANGE.

Something has to give.  Either you work with the given specs or you change
the specs, in which case, you will need a central authority system because
SPAMMERS are not going to change othewise.

CAN-SPAM is a crock.  We should not presume that the US Congress is
qualified to dictate technical standards - heck, they can't even write
nontechnical laws that are in the public interest.

In any case,  the mandate is reality:

- Valid Return Addresses
- Topic Indentication.

2) Mr. Crocker's ASRG Proposal Guideline Document,
draft-crocker-spam-techconsider-02.txt, emphasizes incremental and
backward compatibility minimizes or quelling any desire to fundamentally
alter
the SMTP protocol.

I haven't read this document yet, so I can't comment on the extent to
which I think the argument in the document is valid.  But using
return-path as a sender identity conflicts with its purpose in SMTP as
the destination of (non)delivery reports.  The Sender field from RFC
822 is the closest thing to a verifiable sender identity from the
original design, but it is so widely misused today that it is not
salvageable. Basically the requirements for verification are such that
we're going to need a new field.

Whatever is done, we need to address anonymous access.

3 parameters are currently available:

CIP - Connection Address
CDN - Client domain name
RP - Return Path

The ASRG is focusing on validating the sender machine using a combination of
the above.

In the DMP DNS lookup proposal for example, the  format/rule is:

TXT LOOKUP for:

            RIP.in-addr._smtp-client.CDN
            RIP.in-addr._smtp-client.RPD

where RIP is the reverse address of CIP and RPD is the return path domain.
If either return
DMP=ALLOW or DMP=DENY, then you have logic to work with.

It looks to me that ASRG is focusing on DNS lookup solutions for the
red-herring reason it offers the fastest and lowest overhead "lookup"
solution.

True, if EVERYONE was using it.   There will be a major DNS lookup overhead
when he CDN and RPD does not exist and this was empirically proven in our
testing as we tried to look for all possible ideas that can reduce the CBV
necessity.  DMP and others, like RMX and SPF all suffer from the same
fundamental deployment flaw.

DMP works nicely to validate your own local domain, so if you do a TXT
lookup for:

        8.131.247.208.in-addr._smtp-client.mail.winserver.com

you will see a DMP=ALLOW

        *.in-addr._smtp-client.mail.winserver.com

returns a DMP=DENY.

So this alone has helped stop spoofing spammers from using our own domain or
machines to gain entry.

All in all,  in short,  I am the last person who wants to add anything that
isn't necessary to the product design and backward compatibility is
important.

However, reality is reality.  Anonymous access is a MAJOR industry problem
and the exploitations of the SMTP protocol is right there in our face.  We
can sit around and do nothing about it and  allow the Mickysofts, AOL and
other dictate what the future will be.  Some like AOL, Earthlink and
RoadRunner have already begun to violate the RFC by automatically rejecting
dynamic IP machines at the connecting level.  Some people have suggested
this is not a violation because it is done at the connection level.  True,
but the fact the notification is provided at the SMTP level (55x greeting),
this makes it a RFC violation.

The point is,  either the bounce address is require or not.  If so, then we
need to make it a stronger part of the specification and then FROM there we
can design a better, optimized system.   But as it stands now, we only have
a CBV as the only possible way to get a MAJORITY of the spammers out of the
picture and in my opinion, if a CBV and CAN-SPAM makes them legitimate, then
that's all the better,  there is less reason to turn on your CBV on these
domains.  Less overhead, less redundancy.

That's my view.

All am saying less finally do something about it.  Lets not say its
impossible to solve.  In my view, it can be solved at a technical level and
by solve, I mean minimize the spamming issue to one that is 100% mail
content related and out of the SMTP decision making process.

-- 
Hector Santos, Santronics Software, Inc.
http://www.santronics.com













<Prev in Thread] Current Thread [Next in Thread>