procmail
[Top] [All Lists]

Re: Date adjustment script

2003-09-12 12:36:49

At 08:39 2003-09-12 -0400, R A Lichtensteiger wrote:
[originally offlist, but we've agreed to pop it back onlist]

I wrote some recipes like that a while ago and found that the one that
nails future dates is a good indicator, but that past dates were almost
always a false indicator.

How does your experience compare?

I score date problems as "spammish", and as a result, a funky date in and of itself isn't enough to identify something as junk, so even false positives aren't a problem - it's all taken in conjunction with other characteristics of the message. This allows me to be a bit more arbitrary about my use of the filter - it doesn't have to be 100% because I'm really not likely to lose legitimate email because of it.

I also score for INVALID date formats (which typically seem to have some bogus text describing what the timezone is) - they seem to almost universally be spam, though I merely score them with a higher spammishness score.

Messages < 200K sec BEFORE reception tend to be list-delayed and twits with erratic clocks, but I have an 18H threshold there anyway (yes, less than 3D or 5D - but as I said, I'm using it as an indicator, not an absolute). Bugtraq for instance seems to frequently have 140Ksec+ delays (that list strips incoming Received: headers, so it's difficult to determine exactly where the delay was inserted, but it isn't critical because the single characteristic isn't enough to flag it as spam).

Very LARGE lags in the clock seem to be indicative of spam:

SPAM: +100+100 Date is suspicious at 121651249 seconds {312 00:00:49} BEFORE reception

SPAM: +100+100 Date is suspicious at 121651249 seconds {312 00:00:49} BEFORE reception

Curiously, both of those are from _SEPARATE_ messages from the same spammer and are messages sent at different times.

I threshold advanced clocks at +2H, since it seems most legit mail which has an advanced clock skew is under about 5K seconds (about 1.5 hours), which can sometimes be attributed to morons having their machine set to the wrong timezone.

Excepting the low thresholds, pretty much any advancement of the clock is a consistent indicator of spam. Just reviewing filtered messages since the beginning of this month, I see that a clock in excess of +2H has been spam in every instance except for one, which was a bugtraq message ("SRT2003-09-11-1120 - setgid man MANPL overflow"), which because the date characteristic is merely contributory, that message was NOT classified as spam - however, all the others suffered from MULTIPLE spam characteristics, for example:

SPAM: +125 Single received header for foreign sender
SPAM: +135 Advisory - relayed through backup MX
SPAM: +300 Foreign character set encoding (Windows-1250) in body.
SPAM: +100+100 Date is suspicious at 2678343 seconds {030 23:59:03} AFTER reception
SPAM: +75 Advisory - no non-list cleartext recipient matching X-Envelope-To
SPAM: +249+58 Subject Scoring match 58
SPAM: +(249*0.75) text/html ONLY
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 1577.75
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.05.00  SBS  20030517/1243
From gold(_at_)web2mail(_dot_)com  Tue Sep  9 22:57:55 2003
 Subject: Do YOU know how to earn lot of money on gold rate change?
  Folder:  gzip -9fc >> spam.gz                                         2440


If a message is 18H hours BEFORE or 2H AFTER reception, I add 100 to my spammishness. If it's >72H out, I add an additional 100.


Overall, what I have has been working wonderful for me - just 5 spams so far this month have actually gotten past my filters, and three of those were some eBay scam received nearly concurrent to one another (for which my spewhosts filter has been updated - a filter which adds a score based on whether the message appears to have passed through a mailserver associated with the domain of the From: address, used to flag potential forgeries).

In fact, of the two other spams I received, both of them would now be tagged because I expanded some subject keyword filters (adding prostitute and underwear), as well as having recently narrowed the advanced clock threshold (from +18H to +2H) and bumping up the scoring for invalid date formats.

I also recently modified the recipes to allow for a list skew of 24H if a LISTNAME variable has been defined, so there's an automatic allowance for delays on discussion lists (which in my system already get a boost to their allowed spammishness threshold), which sharply reduces the number of entries in my logfile when handling lists such as bugtraq (I have a spam report emailed daily, and that includes messages which were spammish, not strictly tagged as spam, so I can see how close iffy messages are).

Dates are but one characteristic of my filtering, and they've been useful thus far.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>