procmail
[Top] [All Lists]

Re: Use scoring to determine header format?

2004-05-18 15:33:23
At 16:34 2004-05-18 -0400, fleet(_at_)teachout(_dot_)org wrote:
Probably; but I don't know enough about headers and such to identify them.
This is a real problem for me.  If I knew more about faked domains, IPs,
etc. then I probably wouldn't be stuck trying to identify a cur by the
color of its fur or the shape of its tail.

Have you considered trying a canned spam filter? Run that up front, and then evaluate the return value using procmail?

> The one-word subject itself could be a beneficial test:

I'm going to add the below to the recipe file as a comment.  Currently,
the recipe is identifying not only the ding-aling I was after; but a bunch
of others as well.

Not too many legit messages have a single-word subject. Some do, but then, that's why SPAMMISHNESS has proven to be a better approach than resolving a message as spam on each individual characteristic, taken alone.

I just ran it against my spew received so far this month, and the following are the ONLY hits, and they're all spam:

        Important|hello|Software|information|Document|unknown
        OnlinePharmacyCheap|fake|health|hi

other matches against the misc messages which reach my unsorted mailbox (again, since the top of this month):
        software|leads|bill

Several of these were multiple hits per keyword. (software|hello|important) seem to be the popular ones.

That last one on the unsorted messages is actually a legit message - someone responding about a bill I had to send them via Certified post because they hadn't responded for over two months to email and phone requests. But then, I'm not filtering the messages as spam on just this one characteristic - these are just the hits that this recipe would have scored, and excepting for that last one, they're ALL spew.

        The ratio of positive vs. false positive for this check is 25:2

Run against an arbitrary mailbox containing over 2000 messages netted *0* positives. several other mailboxes (lists and non-lists, noting that many lists won't trip this because they prepend listids in the subject, and much of the traffic are reply messages, etc) showed similar results: no hits. Run against this list, 158 clear messages, 1 positive ("Nesting").

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail