procmail
[Top] [All Lists]

new filter: unfounded reply (reference checker)

2003-12-30 11:55:12
Input and corrections welcome.

I submit another simple spammishness test for interested parties. Please refer to previous posts for an explanation of my SPAMMISHNESS constructs.

I've run this against a number of saved messages (both regular, and previously categorized as spam), and except for the occasional new composition reply (i.e. a reply someone keyed in as a new message, rather than replying in the presense of a previous message), this recipe has very good marks for spotting spam (including having bumped up the score on a few which had previously slipped by my filters -- I experienced an uncharacteristic number of messages which slipped into my inbox in December).

As a reference point, of the 300 messages I have filed away from the procmail list thus far in December, only 3 scored a hit on this recipe. While a 1% false-positive rate might seem alarming, keep in mind that since SPAMMISHNESS recipes work on a contributory basis, rather than a simple pass/fail, the effect of an occassional false positive isn't significant so long as those same messages don't exhibit a number of other spam characteristics.

Of 404 messages in my spam mailbox, 48 had Re: headers, and 20 of those were flagged by this recipe (the remaining 28 were in fact, foreign characterset and spam disclaimer term messages). On review, it appears that this recipe COULD be useful if the SPAMMISHNESS were offset in the presence of a Re: with the additional header(s).


# Name: unfounded_reply
#
# Summary:  Messages with Re: or Fwd: headers which lack either an
# In-Reply-To: or References: header are flagged as suspect.
#
# Notes: The score for this recipe isn't anywhere near enough to tag a
# message as spam by itself, or with only one or two other minour problems,
# but is rather intended to push a message along past the spammishness
# threshold should there be other attributes.  Expect numerous false
# positives caused by nimrods who generate NEW messages with "Re: topic"
# subjects, or who reply to messages in a nonstandard fashion.  So long as
# those messages don't have other SPAMMISHNESS attributes, they won't be
# miscategorized at the final evaluation.
#
# Overhead: minimal:
#       no external processes are invoked.
#       body is not scanned
#       regexp is simplistic
#
# Optimizations:
#       One could use maximal on the scoring for the two headers, but then
#       you wouldn't be able to collect statistics on each header.
#
# Improvements:
#       The Subject regexp could be expanded to include support for numeric
#       (Re2:) type replies.  However, these don't seem to appear in spam, so
#       the end effect wouldn't be significant.

:0
* ^Subject:[[   ]+(Re|Fwd)[]:   ]+
* 1^0
* -2^0 ^References:
* -2^0 ^In-Reply-To:
{
        SPAMVAL="+75"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"
SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} reply subject without supporting headers.${NL}"
}
---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • new filter: unfounded reply (reference checker), Professional Software Engineering <=