[Asrg] Re: 6. Proposals - Verification via RCPT TO callback


----- Original Message ----- 
From: "Yakov Shafranovich" <research(_at_)solidmatrix(_dot_)com>
To: "Hector Santos" <winserver(_dot_)support(_at_)winserver(_dot_)com>; "ASRG"
<asrg(_at_)ietf(_dot_)org>
Sent: Monday, December 01, 2003 8:45 AM
Subject: 6. Proposals - Verification via RCPT TO callback

Hector,

As you have mentioned, you are going to be deploying
this system as multiple customer sites. What we would like to do, is for
you to come back to the group in a few weeks with the results, and let
us know if any of the issues we have raised played any difference. In
order to have a totally objective analysis, we would recommend to run a
bunch of control systems that do not have the method and than compare
the results.


Its been two weeks <g>

I am putting together a summary report of our gamma testing.

Here is just the Santronics site only WCSAP log statistics and breakdown.  I
will provide some analysis now and provide more detail analysis in my report
gather logs from other gamma testers:

Stats for December 1 to 16, 2003

Total Sessions             :   31388
total rbl blocked          :   18887    60.17%
total non-rbl blocked      :   12501
total rbl cbv tested       :   11988
total cbv ready            :   24489
total filter unknowns      :   30415
total filter rejects       :      38
total filter accepts       :     107
total filter test          :     828
total cbv test             :   24344
total tries          :   23248
total rejected             :   23770    75.73%
total sap refused          :    8329    34.21%

total cbv tested           :   24344
total cbv bad              :   21366    87.77%
total cbv bad filter       :      38     0.16%
total cbv bad refused      :    8329    34.21%
total cbv bad ptr          :    2212     9.09%
total cbv bad no mx        :     521     2.14%
total cbv bad no mx host   :       0     0.00
total cbv bad connect      :    5207    21.39%
total cbv bad welcome      :    2042     8.39%
total cbv bad helo         :       7     0.03%
total cbv bad mail from    :       0     0.00%
total cbv bad rcpt         :       0     0.00%
total cbv bad open relay   :    2565    10.54%

total multiple mail from attempts :     870

Overall, the system has an average of 85-95% success rate.  In our case, its
88% for the month of December.  This rate does not include RBL rejected
sites.

As we were fine tuning it,  the last few days included a learning filter
algorithm to auto reject thus skipping the CBV process to reduce redundancy.

RBL is also included, which for the month of December, rejected 60%.
Since the RBL reject is optional, a CBV test can also be performed on RBL
rejected sites.  This help us in comparison the CBV against blacklisted
site.   Since some days I had it off and on, the above does not give you a
good comparison. To see the comparison, go to
http://www.winservercom/public/security.   You will see between Dec 4 and
15, when RBL rejection was disabled in WCSAP,  it failed slightly better
than RBL but I would say they are about here which some what indicates how
well the RBL sites work or rather the concept of having a low overhead
"central site" to validate a system.    But what this shows what RBL does
not cover, WCSAP makes it for it.

Some additional info:

Approximately 22% failed at the PTR, MX, Connect level.

8% failed to respond at the greeting.  However, this includes early data
where the WCSAP timeout was too low.  A new version installed a few days ago
solved this problem.

Approximately 11% tested as open relay sites.  Manual inspection shows this
is actually true positives with actual zombies or compromised sites.

The multiple mail from attempt counts is a initial analysis where I am going
to measure how the systems try multiple MAIL FROM commands when they get
rejected.  In this case, 870 systems tried more than one mail from.  Manual
inspection showed it came from specific sites using specific software.

One of the other test I am going to see is to analyze the software they use.
I  found quite a pattern here which I will detail in my report.

Of the issues raised:

1) Overhead, Redundancy
2) False Negatives/Positives

Solving the redundancy issue is the trick this system.   A learning/training
concept is suggested at the implementation level.

One method used in WCSAP is to pair the email domain with the connection IP
which I call DIP.  A accept/reject DIP filter is automatically created as
the system learns drastically reducing the redundancy.

The logic has been in place a day or so with the last 960+ sessions where
107 connections were accepted and 38 were rejected.

The speed of the system is fast, 1 to 2 seconds when there are no
DNS/connect issues which would add delays.  I added options to set timeouts.
The one timeout required was the greeting as I found some systems (both
legitimate and spammers) have long delays at the greeting.

Overhead would be an issue at the most fundamental level.  Redundancy logic
must be added.  In addition to the learning algorithm,   there is interest
to network the system using some kind of combined C/S, P2P topology of CBV
systems.   Another idea to reduce CBV overhead is to move the logic to the
RCPT TO stage.  If the server refuses the RCPT TO:, then there is no need to
call the CBV.   This has not been implemented yet, but our statistics show
that at least  75% of the RCPT TO are rejected.  Hence, 75% of the CBV calls
can be skipped.

The False Negatives are very negliable.  Of the 20 or so gamma systems,
only a few indicated reports by users at which point a DIP was manually
added. A more detail analysis will be provided of the false
negative/positives in my report.

In summary, this system works with CURRENT standards, a proof of concept is
shown that eliminates the majority of access issues using a SMTP compliancy
testing concept.  Overhead can be address using scalability measures and
both overhead and redundancy can be solved with advanced implementation.

Note: One tope I will discuss in my report is the technical engineering
principles we should be adhering too.  Here is a small part of it.

We (SMTP Server developers) have NO business trying to "interpret" what SPAM
is  with the exception of providing administrative policies.   No
consideration is made to mail context which is the how the ideal TRANSPORT
system show work.   In my opinion, its a MUTE point once you enforce
compliance with SMTP and now CAN-SPAM which requires a valid return path.
In the near future,  Customers who are looking for a SMTP system will ask
one basic question (regardless if they don't know what they talking about):

                "Is your system CAN-SPAM ready?"

Are you prepared to answer this?  Or are you prepared to argue with them?

If you want SMTP to be ready, two things must be done according to CAN-SPAM:

       1) Refine the specification to REMOVE all ambiguity about technical
compliance.  This
            includes dealing with the RETURN PATH and/or the VRFY command.

       2) To minimize overhead, add a new ESMTP which specifies the SUBJECT:

That's its!  Fundamental and simple!

        EHLO Spammer.com
        SUBJECT: straight from their subject line
        MAIL FROM:  validaddress(_at_)spammer(_dot_)com
        RCPT TO: <targeruser>
        DATA
        QUIT

Otherwise, systems will be forced to accept the DATA to do post analysis
WHICH is NOT desirable.

The SPEC should make the SUBJECT an administrative policy issue.  If
required, the server will
issue a refusal is not provided.  Implementations can augment the logic
against what is provided
in the MAIL FROM and RCPT TO.

- VALIDATION OF THE RETURN PATH
- TOPIC IDENTIFICATION

and you are CAN-SPAM ready.

Until next time.

-- Hector



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg