procmail
[Top] [All Lists]

Re: Flood control?

2004-02-26 08:31:10
Professional Software Engineering <PSE-L(_at_)mail(_dot_)professional(_dot_)org>
wrote:
Someone I know got hit with a flood of messages from a
misbehaving autoreply bot.

Thinking about trapping based on CONTENT: Is the body of the message the same
in each case? Or does it vary, include the original message or something
similar? (more on content below)

Nothing about the autoreply met any of the standard
characteristics for vacation messages either, so that filter
wasn't tripped.

Again, thinking of CONTENT: Other than that, scanning the body for "out of the
office", "try again in", "on vacation" etc. with scoring comes to mind for a
procmail-only solution.

I'm thinking something along the lines of taking the sender's
address and incrementing a counter associated with that
address -- if the counter exceeds some limit, that sender's
messages get redirected elsewhere (and as a result, don't get
processed through the list processor, so they don't get
further messages).

Thinking of rate-limiting POSTERS: It'd be nice to set anyone sending X
messages over a given period to "moderated" status on the list itself, no? This
would be a list function though. I like to moderate new subscribers until they
prove themselves, but there's not always that luxury.

There's the little matter of resetting the counter in a
sensible fashion (if you merely purged the counter file on a
weekly basis, then if there was a problem right around the
purge time, it count go for nearly twice the threshold before
being detected).

Just track the last 10 (50? 100?) posted messages to the list, and if many/most
are from the same user, throttle them back?

While the legitimate message traffic to the
list processor account isn't tremendously high, I think it'd
be good to have a system which doesn't have a high processing
overhead -- I had
thought that tossing one liners with a datecode and address
would be one way, then grepping the list.  A cron-invoked
cleanup script could trim and rewrite the log file based on a
cutoff date.

Or just tail the last X entries on every update?

1. Extract the sender
2. Grep the "last X posters" file for that sender
3. If not found, tail the last X-1 to a temp file, otherwise queue for
moderation (?). Just don't send an automated message!!!
4. Append this sender
5. Rotate

An cron job every (short interval) could drop the least recent entries (tail
last X-1 to temp, rotate), ensuring someone talking to themselves on a
low-traffic list could still do so, slowly.

(A really ugly FIFO!) Locking issues abound.

[...]
Checking for a
flood would be as simple as taking the sender address and
doing a wc on the datafile of that name - you're not running
through a bunch of lines which don't have anything to do with
the sender.

Though per-user rotation might be an issue if not done regulary (they'd flood
the list for at least one rotation window). You could use the short-interval
purging of last X entries here too of course, but you now have a BUNCH of files
to attend to as opposed to one. Does this gain you anything?

There's also the possibility of a simple database, which I
could implement, but if this is to be a fairly portable
solution, I think that should be avoided -- having a
dependancy upon a helper app and a specific db implementation
gets messy when you go to share code.

I expect somone on this list will come up with an elegant, procmail-pure
solution.

Thinking more based on CONTENT though...

This lmost sounds like a local razor candidate (cache md5sums of body) if
content is identical. Otherwise, trap on the Subject: line (highly dependent on
WHICH autoreply bot though) or bayes-train on auto-responder messages.

I've been tinkering with ifile, which is bayes-based but can be used to
categorize using user-defined categories. You basically send it a message, and
it will return a result based on the database, with each having MULTIPLE
categories. So for lists I moderate, I'm thinking of:

spam (pre-trained with existing spam store)
flames
autoresponders
top-posters
excessive quoters
html
normal

and other offenses. The trick would be training it, though it'd catch on
quickly enough with regular maintenance. I did a quick pass training it on
several mailing list folders, and it can discriminate pretty well between those
(even using just Subject: To: and From: headers). I haven't played much with
multiple categories WITHIN the same list yet (probably much more of a
challenge).

- Bob


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>