At 06:19 2003-11-22 -0800, S Semple wrote:
I suppose due to the increase in spam filtering
spammers have moved to hiding their words eg.
m.oney or p(_at_)armacy
Have you considered grabbing the subject and then piping it through a sed
script to convert symbols into their likely letter counterparts and remove
spurious symbols?
Seems that would be a lot easier, and taking the conversion hit ONCE means
you would then be able to do regular text searches on the output variable,
instead of having to fool around with mangling every possible text arrangement.
Note that I don't happen to do this -- these messages tend to trip up on
enough other criteria, in addition to an excess of symbols in the subject
triggering another condition.
Here's a few snippets from my spam report - note that this first one
doesn't have symbols in the "keyword" you'd be using, but instead repeats
some characters:
SPAM: +135 Advisory - relayed through backup MX
SPAM: +100 Date is suspicious at 169313 seconds BEFORE reception
SPAM: +45 Advisory - no X-Envelope-To
SPAM: +249 X-Mailer
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +80 multipart/alternative without plain text
SPAM: +20 spam type statements (20)
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 913
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00 SBS 20030914/2123
>From pharma_on_line(_at_)rock(_dot_)com Wed Nov 19 16:25:45 2003
Subject: Fast generic solution better than VIAAGRRA_ 1 cialis=3days ehiwrcc
Folder: gzip -9fc >> spam.gz 2168
This one has separating dots, but still MISSPELLS the keyword:
SPAM: +135 Advisory - relayed through backup MX
SPAM: +25 From/Recipient score 25
SPAM: +100 From service doesn't appear in Received lines
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +150 forged Yahoo
SPAM: Advisory - spammishness is 445
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00 SBS 20030914/2123
>From pnp490(_at_)yahoo(_dot_)com Tue Nov 18 10:01:35 2003
Subject: ePHARMACY Wholesale - LEV.ITRA, VIE.AGRA, Celebrex - INTERNET
PRICES.
Folder: gzip -9fc >> spam.gz 2265
The high subject scoring match on this one is due to the variety of symbols:
SPAM: +125 Single received header for foreign sender
SPAM: +100 Date is suspicious at 55759 seconds AFTER reception
SPAM: +50 Advisory - embedded space on subject
SPAM: +249+65393 Subject Scoring match 65393
SPAM: +(249*0.75) text/html ONLY
SPAM: +40 spam type statements (40)
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 66392.75
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00 SBS 20030914/2123
>From rwillskf(_at_)vrflow(_dot_)oulu(_dot_)fi Sat Nov 15 10:21:11 2003
Subject: "Buy V*iag`ra
Chea:p: ; bdknr
Folder: gzip -9fc >> spam.gz 3071
SPAM: +125 Single received header for foreign sender
SPAM: +135 Advisory - relayed through backup MX
SPAM: +100 Date is suspicious at 42962 seconds AFTER reception
SPAM: +25 From/Recipient score 25
SPAM: +100 From service doesn't appear in Received lines
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +150 forged Yahoo
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 919
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00 SBS 20030914/2123
>From xwxoirs06(_at_)yahoo(_dot_)com Sat Nov 15 07:28:38 2003
Subject: LIVE LONGER with H-uman...G-rowth...H-ormone...halen
Folder: gzip -9fc >> spam.gz 2145
Note that all of those managed to be matched - but none used any leet-text
conversion. I'm thinking if someone wanted to get really serious with
dealing with leet text, a good start would be to remove/replace symbols,
then run a soundex algorythm on the tokens in the subject line.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail