procmail
[Top] [All Lists]

Re: Checking for Non dictionary words

2003-11-22 10:30:41
At 06:19 2003-11-22 -0800, S Semple wrote:
I suppose due to the increase in spam filtering
spammers have moved to hiding their words eg.
m.oney or p(_at_)armacy

Have you considered grabbing the subject and then piping it through a sed script to convert symbols into their likely letter counterparts and remove spurious symbols?

Seems that would be a lot easier, and taking the conversion hit ONCE means you would then be able to do regular text searches on the output variable, instead of having to fool around with mangling every possible text arrangement.

Note that I don't happen to do this -- these messages tend to trip up on enough other criteria, in addition to an excess of symbols in the subject triggering another condition.

Here's a few snippets from my spam report - note that this first one doesn't have symbols in the "keyword" you'd be using, but instead repeats some characters:

SPAM: +135 Advisory - relayed through backup MX
SPAM: +100 Date is suspicious at 169313 seconds BEFORE reception
SPAM: +45 Advisory - no X-Envelope-To
SPAM: +249 X-Mailer
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +80 multipart/alternative without plain text
SPAM: +20 spam type statements (20)
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 913
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00  SBS  20030914/2123
>From pharma_on_line(_at_)rock(_dot_)com  Wed Nov 19 16:25:45 2003
 Subject: Fast generic solution better than VIAAGRRA_ 1 cialis=3days ehiwrcc
  Folder:  gzip -9fc >> spam.gz                    2168


This one has separating dots, but still MISSPELLS the keyword:

SPAM: +135 Advisory - relayed through backup MX
SPAM: +25 From/Recipient score 25
SPAM: +100 From service doesn't appear in Received lines
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +150 forged Yahoo
SPAM: Advisory - spammishness is 445
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00  SBS  20030914/2123
>From pnp490(_at_)yahoo(_dot_)com  Tue Nov 18 10:01:35 2003
Subject: ePHARMACY Wholesale - LEV.ITRA, VIE.AGRA, Celebrex - INTERNET PRICES.
  Folder:  gzip -9fc >> spam.gz                    2265


The high subject scoring match on this one is due to the variety of symbols:

SPAM: +125 Single received header for foreign sender
SPAM: +100 Date is suspicious at 55759 seconds AFTER reception
SPAM: +50 Advisory - embedded space on subject
SPAM: +249+65393 Subject Scoring match 65393
SPAM: +(249*0.75) text/html ONLY
SPAM: +40 spam type statements (40)
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 66392.75
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00  SBS  20030914/2123
>From rwillskf(_at_)vrflow(_dot_)oulu(_dot_)fi  Sat Nov 15 10:21:11 2003
Subject: "Buy V*iag`ra Chea:p: ; bdknr
  Folder:  gzip -9fc >> spam.gz                    3071


SPAM: +125 Single received header for foreign sender
SPAM: +135 Advisory - relayed through backup MX
SPAM: +100 Date is suspicious at 42962 seconds AFTER reception
SPAM: +25 From/Recipient score 25
SPAM: +100 From service doesn't appear in Received lines
SPAM: +35 Advisory - MIME - multipart/alternative
SPAM: +150 forged Yahoo
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 919
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.06.00  SBS  20030914/2123
>From xwxoirs06(_at_)yahoo(_dot_)com  Sat Nov 15 07:28:38 2003
 Subject: LIVE LONGER with H-uman...G-rowth...H-ormone...halen
  Folder:  gzip -9fc >> spam.gz                    2145


Note that all of those managed to be matched - but none used any leet-text conversion. I'm thinking if someone wanted to get really serious with dealing with leet text, a good start would be to remove/replace symbols, then run a soundex algorythm on the tokens in the subject line.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail