Re: Date adjustment script


At 08:39 2003-09-12 -0400, R A Lichtensteiger wrote:
[originally offlist, but we've agreed to pop it back onlist]

I wrote some recipes like that a while ago and found that the one that
nails future dates is a good indicator, but that past dates were almost
always a false indicator.

How does your experience compare?

I score date problems as "spammish", and as a result, a funky date in andof itself isn't enough to identify something as junk, so even falsepositives aren't a problem - it's all taken in conjunction with othercharacteristics of the message. This allows me to be a bit more arbitraryabout my use of the filter - it doesn't have to be 100% because I'm reallynot likely to lose legitimate email because of it.

I also score for INVALID date formats (which typically seem to have somebogus text describing what the timezone is) - they seem to almostuniversally be spam, though I merely score them with a higher spammishnessscore.

Messages < 200K sec BEFORE reception tend to be list-delayed and twits witherratic clocks, but I have an 18H threshold there anyway (yes, less than 3Dor 5D - but as I said, I'm using it as an indicator, not anabsolute). Bugtraq for instance seems to frequently have 140Ksec+ delays(that list strips incoming Received: headers, so it's difficult todetermine exactly where the delay was inserted, but it isn't criticalbecause the single characteristic isn't enough to flag it as spam).


Very LARGE lags in the clock seem to be indicative of spam:

SPAM: +100+100 Date is suspicious at 121651249 seconds {312 00:00:49}BEFORE reception

Curiously, both of those are from _SEPARATE_ messages from the same spammerand are messages sent at different times.

I threshold advanced clocks at +2H, since it seems most legit mail whichhas an advanced clock skew is under about 5K seconds (about 1.5 hours),which can sometimes be attributed to morons having their machine set to thewrong timezone.

Excepting the low thresholds, pretty much any advancement of the clock is aconsistent indicator of spam. Just reviewing filtered messages since thebeginning of this month, I see that a clock in excess of +2H has been spamin every instance except for one, which was a bugtraq message("SRT2003-09-11-1120 - setgid man MANPL overflow"), which because the datecharacteristic is merely contributory, that message was NOT classified asspam - however, all the others suffered from MULTIPLE spam characteristics,for example:


SPAM: +125 Single received header for foreign sender
SPAM: +135 Advisory - relayed through backup MX
SPAM: +300 Foreign character set encoding (Windows-1250) in body.

SPAM: +100+100 Date is suspicious at 2678343 seconds {030 23:59:03} AFTERreception

SPAM: +75 Advisory - no non-list cleartext recipient matching X-Envelope-To
SPAM: +249+58 Subject Scoring match 58
SPAM: +(249*0.75) text/html ONLY
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 1577.75
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.05.00  SBS  20030517/1243
From gold(_at_)web2mail(_dot_)com  Tue Sep  9 22:57:55 2003
 Subject: Do YOU know how to earn lot of money on gold rate change?
  Folder:  gzip -9fc >> spam.gz                                         2440

If a message is 18H hours BEFORE or 2H AFTER reception, I add 100 to myspammishness. If it's >72H out, I add an additional 100.

Overall, what I have has been working wonderful for me - just 5 spams sofar this month have actually gotten past my filters, and three of thosewere some eBay scam received nearly concurrent to one another (for which myspewhosts filter has been updated - a filter which adds a score based onwhether the message appears to have passed through a mailserver associatedwith the domain of the From: address, used to flag potential forgeries).

In fact, of the two other spams I received, both of them would now betagged because I expanded some subject keyword filters (adding prostituteand underwear), as well as having recently narrowed the advanced clockthreshold (from +18H to +2H) and bumping up the scoring for invalid dateformats.

I also recently modified the recipes to allow for a list skew of 24H if aLISTNAME variable has been defined, so there's an automatic allowance fordelays on discussion lists (which in my system already get a boost to theirallowed spammishness threshold), which sharply reduces the number ofentries in my logfile when handling lists such as bugtraq (I have a spamreport emailed daily, and that includes messages which were spammish, notstrictly tagged as spam, so I can see how close iffy messages are).

Dates are but one characteristic of my filtering, and they've been usefulthus far.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail