At 06:57 2000-09-28 -0400, Colin J. Raven wrote:
You have something really interesting here Sean.
A "twitlist" is easily maintainable, and this is an elegant approach IMHO.
Heh, to make it easier, I add users to the twitlist by emailing myself at a
plussed address - the subject is extracted, and the address (or whatever
token, really) from there is appended to the twitlist file. Removal
requires shelling and editing the file (only because I chose not to write a
script to automate that, which would be easy enough) - it gives me the
incentive to say "is this person really repentant yet - if it isn't even
worth the time to shell and edit the file, I guess not".
1. Do you have something like: INCLUDERC=$PMDIR/twitrc?
Something like that. Actually, the twits script is included from a
notwits.rc file. Same sort of thing for spam.rc -- the idea is that
invariably, you're going to know someone who trips some rule, and you can
set them up in a whitelist. For instance, one of my spam rules is when
FROM=TO. But I've a friend or two who uses that for self-made BCC:
distribution lists for things (nevermind that if they understood mail, and
had an ISP worth a damn, they could use a plussed address for the TO:) -- I
can add their address to the nospam.dat file, and then they don't trip that
rule (or any other spam rules for that matter).
Here's a snippage from my boxes.rc (which itself is included into .procmailrc):
# (snip - a few specific administrative INCLUDERCs preceed these)
# Include rules for items to filter BEFORE spam
# (before twits as well - this happens to contain but ONE rule)
INCLUDERC=$PMDIR/prespam.rc
# Filter for TWITS (not spam, but individuals we don't take mail from)
# This is a select group. Takes place BEFORE spam filtering as well as
# some other groups, because twits can exist even in groups normally free
# of spam.
INCLUDERC=$PMDIR/spam/notwit.rc
# Filter items which are clean of spam (outbound/moderated mailing lists,
# digests, and lists known to be subscriber only). When a list starts
# getting spam, it gets moves from the mailclean.rc to a list below the
# spam filters.
#---------------------------------------------------------------------------
# Include rules for spam-free (clean) mailing lists
INCLUDERC=$PMDIR/mailclean.rc
#---------------------------------------------------------------------------
# Include rules for Spam - wrapped in an exception filter.
INCLUDERC=$PMDIR/spam/nospam.rc
# (snip - bulk of INCLUDERCs follow)
In the notwit.rc I have rules for submitting new addresses to both the
twits and notwits databases. the $SUBJECT which is used was extracted in
the .procmailrc using a match (as are a number of other often-used
variables). While I take the processing hit to extract these
(subject/to/from/sender) on EVERY incoming message, I also only take the
hit ONCE, and don't have multiple variations of extraction scattered all about:
:0
* ^Subject:[ ]*\/[^ ].*
{
SUBJECT=$MATCH
}
# (notwit.rc begin)
# ==========================================================================
# Define necessary variables.
# Define the directory all this spam filtering is in...
SPAMDIR=$PMDIR/spam
# The top two recipes allow me to mail an ADDITION to either list (I don't
# have removal code here, as I didn't need to develop it - once a spammer,
# always a spammer -- if someone deserves to be removed, then it'll be worth
# the effort of logging in and manually editing the twit file to remove their
# address - if not, then I don't apparently really want to hear from them).
# ==========================================================================
NOTWITLIST=$SPAMDIR/notwit.dat # non-twit database
# Define the address that twit exception submission should go to. I have
# aliases for my accounts (either virtual host, or manual aliases in the
# system aliases file), but if you have plussed address support you can use
# that.
# if plussed, this definition needs DOUBLE escaping.
NOTWITSUB=userid\\+NOTWITplussedaddr(_at_)host\\(_dot_)domain\\(_dot_)tld
# Is it to my submission address?
# Subject field is address (hey, I'm cheap)
:0
* $ ^TO.*$NOTWITSUB
{
LOG="NOTE: NoTwitSubmit: added $SUBJECT
"
:0:
|echo $SUBJECT >> $NOTWITLIST
}
# ==========================================================================
TWITLIST=$SPAMDIR/twits.dat # twit database
# Define the address that twit submission should go to. I have aliases for
# my accounts (either virtual host, or manual aliases in the system aliases
# file), but if you have plussed address support you can use that.
# if plussed, this definition needs DOUBLE escaping.
TWITSUB=userid\\+TWITplussedadddr(_at_)host\\(_dot_)domain\\(_dot_)tld
# Is it to my submission address?
# Subject field is address (hey, I'm cheap)
:0
* $ ^TO.*$TWITSUB
{
LOG="SPAM: TwitSubmit: added $SUBJECT"
:0:
|echo $SUBJECT >> $TWITLIST
}
# ==========================================================================
# If there is a match on any string in the notwitlist anywhere within the
# headers, excepting the subject line (typically, we would expect to find the
# match in one of the from, to, cc, messageid, or recieved lines), then SKIP
# twit filtering.
# Just like nospam.rc, this filtering will also catch the X-PSE-BYPASS header,
# but not actually being matched as a header per-se. Allows us to reprocess
# messages which were originally caught by the spam filters, and which really
# should remain filtered, but we need to import one or two into the mail stream
# (say for ease of forwarding, or getting an attachment).
:0h
FAILKEY=| ($FORMAIL -ISubject: | $MEGAGREP -i -f $NOTWITLIST)
# If failkey is blank, we didn't match anything in the greenlist
:0
* $FAILKEY ?? ^^^^
{
LOCKFILE=$TEMP/twitsrc$LOCKEXT
INCLUDERC=$PMDIR/spam/twits.rc
LOCKFILE
}
# ==========================================================================
# (notwit.rc end)
There is a virtually identical (except for "twit" vs "spam" in various
variables) copy of this rcfile for nospam.rc.
The twits.rc (which is a *LOT* less complex than the spam.rc), is:
# (twits.rc begin)
# (Revision history omitted)
# This file filters out messages coming from twits (by address),
# The sendmail access.txt database is my current favoured method of dealing
# with the true twit. There, I don't incurr the processing overhead of
# even an attempted local delivery and this filtering. The sender also gets
# a bounce AT THE TIME OF THE SMTP TRANSACTION.
# Because of the differing matching performed between the twit and spam
# filtering, two, there are two distinct databases used. See spam.rc for
# spam handling.
# ==========================================================================
# Define necessary variables.
# Define the directory all this spam filtering is in...
SPAMDIR=$PMDIR/spam
# Define the version of this filter, so we can emit a message to the log.
# when I tweak these rules, I tweak the version. Simple.
TWITVER="
INFO: TwitFilter v02.00.00 PSE 2000.03.16 06:24:00
"
# ==========================================================================
# I realize this here is a spam filter (and is still present in the spam
# filters as well, in case twits were skipped for some reason), but this is
# singularily so important that we shouldn't skip it.
# From: header blank or not even present!
# Anybody mailing and not identifying a from, MUST be spamming.
:0
* ! ^From:[ ].+
{
LOG="SPAM: No From:$TWITVER"
:0:
|gzip -9fc>>$MAILDIR/twits.gz
}
# ==========================================================================
NEUTLIST=$SPAMDIR/neutral.dat # neutral twit database
# If a SPECIFIC ADDRESS from the neutrallist appears anywhere within the
# headers, minus subject, and addressees, toss it. Most people would
# probably choose to roll this together with the regular twit filtering,
# but I'm retentive here: these are messages of the type that "you elected
# to subscribe to our service, so we mail these notices out periodically".
# sort of spam really, but I don't want it in the spam database should I
# pump it through a process that might otherwise blacklist the submitter
# domain or email address...
:0h
FAILKEY=| ($FORMAIL -ISubject: -ITo: -ICc: -IResent-To: -IResent-Cc: |
$MEGAGREP -i -f $NEUTLIST)
# If failkey is nonblank, we matched something.
:0
* ! $FAILKEY ?? ^^^^
{
LOG="SPAM: Neutral spam - ads from mailing lists and such
[$FAILKEY].$TWITVER"
:0:
|gzip -9fc>>$MAILDIR/twits.gz
}
# We wouldn't autoreply to these...
# ==========================================================================
# If a SPECIFIC ADDRESS from the twitlist appears anywhere within the
# headers, minus subject, and addressees, toss it.
# Because twit filtering occurs before lists, even those which are filtered
# before spam checking, this allows us to catch morons who post spam or
# other drivel to mailing lists.
#FAILKEY=| ($FORMAIL -ISubject: -ITo: -ICc: -IResent-To: -IResent-Cc: |
$MEGAGREP -i -f $TWITLIST)
# for *MY* purposes, it works just fine to zot all messages containing
# references to these emails. Remember, they're a select group -- by
# inclusion of To: and Cc: headers, if I'm on a list, and one of these
# twits writes an initial message, I don't see it (from), *AND* if people
# reply on-list with cc/to: the twit, I won't see them, *AND* if
# in-reference-to type headers are present on replies (even if not copied
# to the individual, I shouldn't see those). I thereby avoid most, if not
# all of the veritable sh*tstorm twits usually generate.
:0h
FAILKEY=| ($FORMAIL -ISubject: | $MEGAGREP -i -f $TWITLIST)
# If failkey is nonblank, we matched something.
:0
* ! $FAILKEY ?? ^^^^
{
LOG="SPAM: Match against twitlist [$FAILKEY].$TWITVER"
:0:
|gzip -9fc>>$MAILDIR/twits.gz
}
# The following recipe can be used to auto-reply a twit message
# to the sender (provided that it is a valid address).
#
# To enable this, add the 'c' flag to the preceeding recipe (otherwise, this
# won't execute at all). Arguably, if the preceeding recipe sends the
# message to /dev/null then you can simply concatenate the action lines of
# this recipe with the above recipe, replacing the /dev/null action, and
# eliminating the need for a 'c' on that recipe.
#
# Note that I don't have this enabled - when used for spam.rc, spammers
# simply don't read this stuff (if their mail is even valid), and most of
# the twits I run into are on mailing lists, so why should I waste my
# bandwidth sending it (and probably getting a bounced-back return)?
#
:0 Aw
| ( $FORMAIL -rt -I "Precedence: notification" -I "From: $MAILBOT" ;\
cat $AUTOREPLY/twit.msg ) | $SENDMAIL -t
# ==========================================================================
# (twits.rc end)
Another simplified application of greppage. I have about six or seven of
these lists all in one RC, but each archiving to a different (gzipped)
mailbox on the server, using a different datafile, and adding a different
MB header (used to simplyfy the filtering in my PC mail client - then all
it cares about is what the MB header is).
# Friends
#
:0
* $? $FORMAIL -xFrom: | $FGREP -i -f $PMDIR/friends.dat
{
:0c:
| $FORMAIL -b -A"X-my-MB: FRIENDS" >> $DEFAULT
:0:
|gzip -9fc>>$MAILDIR/friends.gz
}
The delivery mechanism is specific to how I do things - I don't store them
into a mailbox on the server, but rather _archive_ them with GZIP, so gobs
of email don't consume nearly as much disk space.
I guess this demonstrates how long I've been using this sort of filter:
with the centralized from/to/subject/etc snarfing, the match line really
should be:
* $? echo $FROM | $FGREP -i -f $PMDIR/friends.rc
(unless someone sees a reason it shouldn't be this way)
2. Do you use this also as a "plonk"?
You'll have to describe what it is you mean by "plonk".
---
Please DO NOT carbon me on list replies. I'll get my copy from the list.
Sean B. Straw / Professional Software Engineering
Post Box 2395 / San Rafael, CA 94912-2395
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail