Hello people,
I am breaking my head with the reciept I need. I am subscribed to a lot
of mailing lists on http://www.egroups.com . Every message in any of
those lists contains an advertising, which looks like this:
------------------------ eGroups Sponsor -----------------------~-_>
bla blah blah
----------------------------------------------------------------~-_>
As you have guessed, I don't want to see it, in any of my messages ;-)
How can I do it in the best way?
spam detection can be tricky; talk on the procmail list about investing for
profit is off-topic, but on a list about finance it is probably right on the
money.
I deal with some specific spammers thus:
:0
* ^From: (_dot_)*silvia_brown(_at_)usa(_dot_)net
! postmaster(_at_)usa(_dot_)net
or
:0:
* ^From.*most-wanted.com
{
:0
* ^Message-id:.*bigpond.com
! postmaster(_at_)bigpond(_dot_)com
}
I find once I've implemented a rule like this, they don't bother me again.
Sometimes a site generates lots of erroneous error messages. I give them much
the same treatment.
:0
* ^From:(_dot_)*MAILER-DAEMON(_at_)telkom(_dot_)net
* Received: .* invoked for bounce
{
:0B
* .*disk quota exceeded
* (_dot_)*list-request(_at_)redhat(_dot_)com
| mail postmaster(_at_)telkom(_dot_)net -s"This error message should be
directed to
the mailing list manager"
}
Finally, this is probably nearer what you want. I used it for spam-detection
on this list; not long after I started on it, the list improved, but it was
doing quite well.
:0
* procmail-request@
{
:0B:spam
* -20^0
* +3^2 MARKETING|FREE| ad |have( just)* made|FACT|Call NOW|\
removed|free subscription|remove.*subject
* +5^3 \!\!
* +20^2 We are (terribly )* sorry if you received this message
in error
* +150^2 money|millionaire|ecommerce|e-commerce|substantial
earnings|financial opportunity
* +150^2 million|billion|casino|banners|income|earnings
* +150^2 profit|dollar|gambling|porn|George Beecroft
* +150^2 our remove list|This list will NOT be sold
* +150^2 harvesting software|phone calls|save money|submit
your URL|house|senate
* +180^2 (bill|s\.) *1618
* +20^1 (call|phone) *[-0-9]+
* -200^2 body|header|procmailrc|procmail
| $MHp +procmail/spam${CDATE}
:0:Procmail
| $MHp +procmail/Procmail${CDATE}
}
The idea here is that some words probably don't appear in legit mail, but do
in spam. More occurrences of some words is more bad. OTOH, there are some
words that indicate that, despite all the bad words, the message probably IS
legitimate. After all, in the procmail list we often talk about spam.
There's a good chance this message will pass my filters as legitimate because
it talks about procmail and even mentions procmailrc.
To help understanding what this filter does, read carefully the documentation
on scoring. At least twice;-)
I create my list of bad word by examining genuine spam and choosing words that
I thought likely to be used by spammers and not otherwise. Some messages I
could not find a way of legitimately clasifying as spam, and some got fed into
the spam folde even though they weren't.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail