ietf-smtp
[Top] [All Lists]

Re: CBV systems - was Re: SMTP Extensions - proper peply code for disabled commands

2004-01-14 15:40:17

The reason I asked is that I used CBV for many years to filter spam
from my mailing lists, and I often found that (otherwise) legitimate
addresses failed to verify if I used MAIL FROM:<>.  (I was using CBV
on the header From address - on the assumption that if someone on
the list couldn't reply to the message by normal means, then it
probably wasn't useful to send that non-subscriber's comments to the
list anyway).

Most of the time the verification would fail in the MAIL FROM
response, but occasionally it would fail in RCPT TO response.  So I
ended up having to send MAIL FROM:<postmaster(_at_)my(_dot_)site> or some 
such
instead in order to get an accurate verification.

(one could argue that you shoudn't accept mail from misconfigured
systems anyway, but bouncing such messages doesn't do anything
useful IMHO because it doesn't report an error that is likely to be
understood by the person receiving the bounced mail.  That and you
have to arrange for the bounces to have a return-path other than <>
in order for them to even get back to the sender.)

I'm not seeng the same level of result you seem to suggest occurs in
mass. Yes, some systems will behave differently when provided a NULL
vs non-NULL Mail From:   but it is a very relative few from what I
see.

The situation may have changed in the last year (it would certainly be
good if MTA admins were getting smarter).  Or it might be that we're
testing different user communities.  (one of the odd things about spam
detection is that it's very hard to get a representative sample - just a
method works well with one sample doesn't mean it will work well with
another.  And a large sample is not necessarily a good indicator of how
well a small sample will behave.)

But the goal of the CBV, at least ours, is to check for a BAD address.


same for mine.
 
The good news is that the BAD systems are the ones mostly likely
providing the results we are looking for.   By far, they are spoofing
fake addresses and this is what we are helping to eliminate in this
process.

not in my experience.

In other words, a CBV is catching the majority of the spoofing
spammers.

not in my experience.
 
The bad news is that in the end, they will be legitimate (CAN-SPAM)
making a CBV system a redundancy issue and more of a auditing/tracking
system.

Unless CAN-SPAM (both laws and enforcement, which is much harder)
spreads to the whole planet, the spammers will just move offshore
(maybe not the same folks, but there will still be spammers), and
they'll continue to forge adddresses (legitimate ones).  So CBV will be
worse than useless.  (which is why I didn't encourage people to use it-
as in the long run it would only increase complexity and overhead and
it wouldn't benefit anyone.)


Here's what I see happening if this is widely adopted:

A0->B0: MAIL FROM:<my.address>
-B0->A1: HELO
-B0->A1: MAIL FROM:<dummy(_dot_)address(_at_)B>
--A1->B1: HELO
--A1->B1: MAIL FROM:<dummy(_dot_)address(_at_)A>
---B1->A2: HELO
---B1->A2:MAIL FROM:<dummy(_dot_)address(_at_)B>
----A2->B2: HELO
----A2->B2: MAIL FROM:<dummy(_dot_)address(_at_)A>
...

Use a NULL address. 

That's fine if systems don't reject the address out-of-hand.  I suspect
some still do.  My verifier tries to use a NULL address but if MAIL FROM
is rejected, it switches to a non-NULL address.  (if the mailer rejects
RCPT TO because FROM was NULL then the sender is out of luck) but I'm
only verifying list traffic and I'm assuming that the other end isn't
doing CBV at SMTP time because up to now it's been a fairly rare
practice.

Very little spammers use a NULL address because it
reinforces further DSN checking that will fail their transaction. 

Of course, systems that do such checking are broken. Nothing says that a
NULL address signals a DSN.


What is the RFC 2821 presumption for the validity of the return
path?

Or at what point should RFC 2821 presume the return path to be
valid? When
it is provided or when its too late (after the mail is received
and rejected)?

Return-path is intended for (non)delivery notification, not for
sender verification.  It is not reasonable to take return-path as an
indicator of sender identity.

Why not?  

Because there are *lots* of cases where you want reports to go to some 
other address than that of the person who sent the message.

I believe the presumption is made in RFC 1123 and in RFC
2821 that DNS and error reporting is to be returned to a "existing"
return path. 

Well, sure the address is supposed to exist (or be null), because we
don't want to waste the resources of the mail system delivering reports
to nonexistent addresses.  But that doesn't mean that it's the identity
of the sender.

SMTP servers _should_ return immediate error conditions (in response
to MAIL FROM, RCPT TO, DATA) whenever possible because, for a wide
variety of reasons, immediate reporting is much more reliable and
consumes less resources than bouncing the message.

I agree.   But there are still legacy and legimiate reasons for
delaying the verificaton/validation/checking at RCPT TO:

Yes, though we'd do well to discourage it.  Some of the reasons are
good, others are bogus.  IMHO.

In our case, the CBV does not work well for a YAHOO.COM type system
because it is not reaching the DATA state.  I personally believe YAHOO
acceptance of all recipients contributes the the spamming problem and
distribute the cost to other systems because spoofers will use
YAHOO.COM as the return domain knowing YAHOO.COM will accept it.

could be.

But this does not a reason to throw out the baby with the bath water. 
The CBV system works to eliminate  a majority of the spoofing
spammers.

Not in the long term.  At best it's a stopgap measure.  I give it six
months to a year.  

For similar reasons, SMTP senders
_should_ minimize the relay path length to the receiver's MX rather
than aggregating traffic through upstream relays.  Of course there
are reasons to have both outbound and inbound relays, but these do
come at a cost.

Again I agree.   Keep in mind that this is will alter the
specifications, hencing making a deployment issue.

I see it as more of a configuration issue.  The protocol doesn't have
to change to make this work.
 
Despite my use of CBV on mailing list traffic, I don't think they
are nearly as good an indicator for traffic in general, for lots of
reasons:

- The presumption that traffic with an invalid return-path is
invalid, while perhaps reasonable for some mailing lists, is less
reasonable as a general rule.

Emperically,  I am not finding this to be true.  With over 25 testing
sites processing millions of messages (for our systems an average of
2500+ or so per day),   +80% of the transactions are rejected due to
CBV checking with 100% TRUE negatives (remotes will refuse the return
address when the RCPT TO is issued at their site).

It sounds like you're defining CBV performance in terms of CBV.  The
relevant questions are different.  How many of those messages were
actually valid messages that should have gone to the recipient even
if the return-path was screwed up?  Perhaps the sender mistyped his
domain name into his user agent preferences and was trying to send
mail to his tech support staff.  Perhaps the DNS administrator put the
wrong MX into the DNS zone, or forgot to update the serial number
when editing the zone file, so the mail is still going to the old MX
or A record instead of the new one.  Perhaps the SMTP server was
misconfigured, or perhaps it uses NIS to verify addresses and the NIS
server is down and the MTA isn't able to distinuish temporary NIS errors
from "user does not exist".  I've seen all of these happen many times.

- The verification has lots of overhead, and some potential for
looping if widely adopted.

A technical problem that can be addressed and has been addressed.  A
NULL address is all you need to do. 

My experience is different, though it is somewhat dated.
 
- The effectiveness of CBV has decreased considerably over time. 
When I first started using it (not sure when, maybe 7 years ago) it
identified nearly 100% of spam sent to my lists.  When I stopped
using it on a large scale (just over a year ago) it only identified
about 25% of the spam sent to my lists.  It appeared that spammers
had realized that mail from invalid addresses was less successful,
so they simply started sending mail from valid addresses.

Again, today, we are seeing atleast 80% of spammers are spoofers.  I
believe industry research also agrees with these level of numbers and
I believe the main reasin why it is one of the mandates of CAN-SPAM
Act; validate return addresses.

Wasted effort.  Spammers will circumvent it.

Now, I will say that I expected the same results; with Spammers
learning. Emperically, I have not seen this. 

I can't say for sure why the effectivenss of CBV dropped.  I can only
say that it did drop.

I believe this is
because it is MORE in their interest to remain anonymous than to try
issue valid but fraudulent use of addresses which is a ECPA federal
crime.

I do think that spammers have learned to avoid using addresses that
other people are using.  I'm not getting many complaints that I'm
spamming  people anymore (resulting from spammers forging my address). 
But the return addresses on most of the spam I see are verifiable using
SMTP. Whether they can be reliably associated with the spammer, I cannot
say.

More ironically, what I have seen is a 60% increase of HELO drops once
we place a POLICY STATEMENT in the greeting with the specific words:

        WARNING:  FOR AUTHORIZED USE ONLY!

I believe this is result of the AOL lawsuit.  So the BIG spammers do
look for these SPECIFIC words and will drop their connections once
seen.

I personally couldn't believe my eyes.  I remove the policy statement
and behold, they didn't drop.  Put back, they began to drop again.

At least that's a low-overhead solution :)

But I have not seen them learn the MAIL FROM.

What I have seen is some of them will issue MULTIPLE MAIL FROMS: if
dynamically rejected.   I seen atleast 2-3 attempts here before
dropping.

Maybe you're seeing my verifier :)
 
- In my experience, CBV does not provide 100% true negatives -
because of misconfigured systems that reject MAIL FROM:<> and other
valid mail.

When I say 100% true negatives, I am talking about 100% rejections at
the remote RCPT TO state.  That is, IMO, a 100% trusted value.

My experience is different than yours.  
 
I can give you 6 months worth of millions of trace logs. I will be
surprise to see more than 0.01% of MAIL FROM: <> breaks.  I will try
to find this count for you today.

  (if you have your SMTP server report 4xx whenever the CBV
  tempfails,
there are other problems - it is not unusual for a CBV to fail
because either the sender's DNS servers or the sender's SMTP server
are too slow or inaccessible, even though the sender's system is
sending outgoing mail.)

From a SMTP compatibility standpoint, this is the one of the main
issues. Fortunately, emperically, what we are seeing is that by far
MOST systems are SMTP compliant, run good systems and the onces the
CBV will catch are bad systems and for the false negatives, the 45x
does tell the remote "Hey, try again"   Fortunately, the bad systems
WILL not.  The good systems will. Thats what I am seeing.

But again, we should not throw out the baby with the bath water.

This isn't a baby.  A baby is valuable because of its future potential,
and CBV has a very limited future potential.

In short - CBV is expensive for the receiver and easily defeated by
spammers.

By becoming legit and trackable?  I see that as a good thing.

Just because you can check a return-path with RCPT doesn't mean this
helps you trace the sender.  RCPT tells you nothing about whether
the sender actually owns that address.  We can't assume that spammers
are law-abiding.

  widely adopted it very quickly becomes useless overhead.

Rather useless redundancy.  Again, a good thing.  Not a bad thing.

Incurring overhead for no useful purpose is at best wasteful and 
expensive, and at worst an invitation to denial-of-service attacks.

I do think we need reliable traceability to the sender, but doing
CBV on return-path is not the way to get it.

Then you have a Deployment problem and you will not be able to do it
without CHANGE.

Indeed, this is true.
 
Something has to give.  Either you work with the given specs or you
change the specs, in which case, you will need a central authority
system because SPAMMERS are not going to change othewise.

It doesn't have to be central.
 
CAN-SPAM is a crock.  We should not presume that the US Congress is
qualified to dictate technical standards - heck, they can't even
write nontechnical laws that are in the public interest.

In any case,  the mandate is reality:

- Valid Return Addresses
- Topic Indentication.

When people use the word "reality" what they generally tend to mean is
"according to my prejudice".  CAN-SPAM may influence US-based spammers
but that doesn't mean it will get rid of spam.


2) Mr. Crocker's ASRG Proposal Guideline Document,
draft-crocker-spam-techconsider-02.txt, emphasizes incremental and
backward compatibility minimizes or quelling any desire to
fundamentally
alter
the SMTP protocol.

I haven't read this document yet, so I can't comment on the extent
to which I think the argument in the document is valid.  But using
return-path as a sender identity conflicts with its purpose in SMTP
as the destination of (non)delivery reports.  The Sender field from
RFC 822 is the closest thing to a verifiable sender identity from
the original design, but it is so widely misused today that it is
not salvageable. Basically the requirements for verification are
such that we're going to need a new field.

Whatever is done, we need to address anonymous access.

I agree - we need to make sure that anonymous access is provided for.
Of course that doesn't mean people have to accept mail from anonymous
senders.

3 parameters are currently available:

CIP - Connection Address
CDN - Client domain name
RP - Return Path

All of these are insufficient, for various reasons.  We're not going
to solve the spam problem without protocol extensions.

The point is,  either the bounce address is require or not.  If so,
then we need to make it a stronger part of the specification and then
FROM there we can design a better, optimized system.  

What you are proposing to do is to change what the bounce address is 
used for, which isn't the same thing as strengthening it.


All am saying less finally do something about it. 

...whether it makes sense or not?

Lets not say its impossible to solve. 

It's not impossible to solve.  However, that doesn't mean that the
solution is attractive.  For instance, do we really want to make it
easy for governments to trace every single email message to the sender?
Because it's very hard to build a system that lets spammers be 
identified reliably (and held accountable) that doesn't have the side
effect of enabling mass surveillance by governments.  Or do we really
want to have a single root CA for all email?  That has its own problems.

At the same time, pretending that we can solve the spam problem without
significant changes is just wishful thinking - there's no basis for
believing that it will work.

Keith

<Prev in Thread] Current Thread [Next in Thread>