new filter: unfounded reply (reference checker)

Input and corrections welcome.

I submit another simple spammishness test for interested parties. Pleaserefer to previous posts for an explanation of my SPAMMISHNESS constructs.

I've run this against a number of saved messages (both regular, andpreviously categorized as spam), and except for the occasional newcomposition reply (i.e. a reply someone keyed in as a new message, ratherthan replying in the presense of a previous message), this recipe has verygood marks for spotting spam (including having bumped up the score on a fewwhich had previously slipped by my filters -- I experienced anuncharacteristic number of messages which slipped into my inbox in December).

As a reference point, of the 300 messages I have filed away from theprocmail list thus far in December, only 3 scored a hit on thisrecipe. While a 1% false-positive rate might seem alarming, keep in mindthat since SPAMMISHNESS recipes work on a contributory basis, rather than asimple pass/fail, the effect of an occassional false positive isn'tsignificant so long as those same messages don't exhibit a number of otherspam characteristics.

Of 404 messages in my spam mailbox, 48 had Re: headers, and 20 of thosewere flagged by this recipe (the remaining 28 were in fact, foreigncharacterset and spam disclaimer term messages). On review, it appearsthat this recipe COULD be useful if the SPAMMISHNESS were offset in thepresence of a Re: with the additional header(s).



# Name: unfounded_reply
#
# Summary:  Messages with Re: or Fwd: headers which lack either an
# In-Reply-To: or References: header are flagged as suspect.
#
# Notes: The score for this recipe isn't anywhere near enough to tag a
# message as spam by itself, or with only one or two other minour problems,
# but is rather intended to push a message along past the spammishness
# threshold should there be other attributes.  Expect numerous false
# positives caused by nimrods who generate NEW messages with "Re: topic"
# subjects, or who reply to messages in a nonstandard fashion.  So long as
# those messages don't have other SPAMMISHNESS attributes, they won't be
# miscategorized at the final evaluation.
#
# Overhead: minimal:
#       no external processes are invoked.
#       body is not scanned
#       regexp is simplistic
#
# Optimizations:
#       One could use maximal on the scoring for the two headers, but then
#       you wouldn't be able to collect statistics on each header.
#
# Improvements:
#       The Subject regexp could be expanded to include support for numeric
#       (Re2:) type replies.  However, these don't seem to appear in spam, so
#       the end effect wouldn't be significant.

:0
* ^Subject:[[   ]+(Re|Fwd)[]:   ]+
* 1^0
* -2^0 ^References:
* -2^0 ^In-Reply-To:
{
        SPAMVAL="+75"
        SPAMMISHNESS="${SPAMMISHNESS}${SPAMVAL}"

SPAMNOTES="${SPAMNOTES}SPAM: ${SPAMVAL} reply subject withoutsupporting headers.${NL}"

}
---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail