procmail
[Top] [All Lists]

Re: Spam

2004-02-04 13:23:51
On Wed, Feb 04, 2004 at 01:19:49PM -0600, Jason Crowe wrote:

Bart Schaefer <schaefer(_at_)zanshin(_dot_)com> wrote:

Because SA is big and slow-moving and consequently a target for
spammer evasions, plus difficult or impossible to install in some
environments where procmail is already available (web-hosted
accounts, mostly).

Um, I would SERIOUSLY disagree with your characterization of SA.
Several of the big rule sets are updated weekly - or more.  I don't
know of any Linux environments where it doesn't run (if *I* can
install it, surely someone who actually knows Linux can).  Seems to
work just fine with both Horde & Squirrelmail.

It's very active development wise, but slow moving resources
wise. Procmail recipes are generally a lot faster than using
spamassassin. I used procmail exclusively on our mail servers and
never had a load problem. Now that I am using spamassassin I need to
upgrade to a faster server.

Yup.  Besides which, I'd developed a good deal of my procmail
stuff before there *was* a SpamAssassin.  And I get fewer false
positives and false negatives.

Here's some timing tests on one hundred spam messages.  I will leave
my two-line prompt in, because it shows the load in the top line.
I'll indent, but flush-left my comments.

    [217.228.131.218 -> panix5] {dman} [1.63]
     8:20pm [~/Mail] 242[100]> messages myspam
    There are 100 messages in folder myspam.
    
    [217.228.131.218 -> panix5] {dman} [1.63]
     8:20pm [~/Mail] 243[100]> time sh -c "formail -s procmail TEST=y \\
                               HARNESS=mytest DIAGS=y 2>&1" < myspam > foo.log
    35.042u 17.479s 1:36.45 54.4%   0+0k 12+4399io 0pf+1w
    
    [217.228.131.218 -> panix5] {dman} [3.56]
     8:22pm [~/Mail] 244[0]> 



Okay, that took just over a minute-and-a-half, and seems to have sent the
load up a bit briefly on this machine.

I hit the Enter key a couple of times to watch the load come back down:

    
    [217.228.131.218 -> panix5] {dman} [2.48]
     8:23pm [~/Mail] 244[0]> 
    
    [217.228.131.218 -> panix5] {dman} [2.36]
     8:23pm [~/Mail] 244[0]> ls -l foo.log
    -rw-------    1 dman     users      450668 Feb  4 20:22 foo.log
    
That's the file we wrote.


    [217.228.131.218 -> panix5] {dman} [1.82]
     8:23pm [~/Mail] 245[0]> 

I've taken out some irrelevant command lines.

    
    [217.228.131.218 -> panix5] {dman} [1.12]
     8:25pm [~/Mail] 249[0]> time sh -c "formail -s spamc -d spamd 2>&1" \
                                  < myspam > bar.log   
    ^C0.034u 0.112s 24:09.60 0.0%   0+0k 0+27io 0pf+11w
    
Christ.  I interrupted that after *twenty-four minutes*!!!  I thought
it must be stuck.  But no, we were nearly 90% through:

    [217.228.131.218 -> panix5] {dman} [1.15]
     8:50pm [~/Mail] 250[1]> ls -l foo.log
    -rw-------    1 dman     users      579420 Feb  4 20:50 bar.log
    
    [217.228.131.218 -> panix5] {dman} [1.13]
     8:50pm [~/Mail] 252[0]> messages bar.log 
    There are 89 messages in folder bar.log.
    

Let me add some information:  Panix has added an entire other host
machine to run these mail processes.  There are no users logged in to
that machine!  Most of its CPUs are used up running spamd.  The load
is often extreme.  I can't log in to that machine, but I can check
its load.  I checked a few times during the twenty-four minutes I was
waiting.  The load was between 110 and 128 the whole time!  A couple of
minutes after I bailed on the job, it is back to 6 or 7 for the time
being.  Hmm.

So, which way would *you* rather catch spam?

Dallman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>