Re: reciept needed

Hello people,

  I am breaking my head with the reciept I need.  I am subscribed to a lot
of mailing lists on http://www.egroups.com .  Every message in any of
those lists contains an advertising, which looks like this:
------------------------ eGroups Sponsor -----------------------~-_>
bla blah blah
----------------------------------------------------------------~-_>

As you have guessed, I don't want to see it, in any of my messages ;-)
How can I do it in the best way?


spam detection can be tricky; talk on the procmail list about investing for 
profit is off-topic, but on a list about finance it is probably right on the 
money.

I deal with some specific spammers thus:

:0
* ^From: (_dot_)*silvia_brown(_at_)usa(_dot_)net
! postmaster(_at_)usa(_dot_)net                                                 
          


or

:0:
* ^From.*most-wanted.com
{
        :0
        * ^Message-id:.*bigpond.com
        ! postmaster(_at_)bigpond(_dot_)com
}                                                                              

I find once I've implemented a rule like this, they don't bother me again.



Sometimes a site generates lots of erroneous error messages. I give them much 
the same treatment.

:0
* ^From:(_dot_)*MAILER-DAEMON(_at_)telkom(_dot_)net
* Received: .* invoked for bounce
{
   :0B
   * .*disk quota exceeded
   * (_dot_)*list-request(_at_)redhat(_dot_)com
   | mail postmaster(_at_)telkom(_dot_)net -s"This error message should be 
directed to 
the mailing list manager"
} 

Finally, this is probably nearer what you want. I used it for spam-detection 
on this list; not long after I started on it, the list improved, but it was 
doing quite well.

:0
* procmail-request@
        {
                :0B:spam
                * -20^0
                * +3^2 MARKETING|FREE| ad |have( just)* made|FACT|Call NOW|\
                        removed|free subscription|remove.*subject
                * +5^3 \!\!
                * +20^2 We are (terribly )* sorry if you received this message 
in error
                * +150^2  money|millionaire|ecommerce|e-commerce|substantial 
earnings|financial opportunity
                * +150^2  million|billion|casino|banners|income|earnings
                * +150^2  profit|dollar|gambling|porn|George Beecroft
                * +150^2  our remove list|This list will NOT be sold
                * +150^2  harvesting software|phone calls|save money|submit 
your URL|house|senate
                * +180^2  (bill|s\.) *1618
                * +20^1  (call|phone) *[-0-9]+
                * -200^2  body|header|procmailrc|procmail
                | $MHp +procmail/spam${CDATE}
 
                :0:Procmail
                | $MHp +procmail/Procmail${CDATE}
        }


The idea here is that some words probably don't appear in legit mail, but do 
in spam. More occurrences of some words is more bad. OTOH, there are some 
words that indicate that, despite all the bad words, the message probably IS 
legitimate. After all, in the procmail list we often talk about spam.

There's a good chance this message will pass my filters as legitimate because 
it talks about procmail and even mentions procmailrc.

To help understanding what this filter does, read carefully the documentation 
on scoring. At least twice;-)

I create my list of bad word by examining genuine spam and choosing words that 
I thought likely to be used by spammers and not otherwise. Some messages I 
could not find a way of legitimately clasifying as spam, and some got fed into 
the spam folde even though they weren't.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail