procmail
[Top] [All Lists]

Re: Flood control?

2004-02-26 20:03:01
At 10:20 2004-02-26 -0500, Bob George wrote:

Thinking about trapping based on CONTENT: Is the body of the message the same
in each case? Or does it vary, include the original message or something
similar? (more on content below).

In this particular case, it was, but on the offchance that someone would reply with a copy of the majordomo command process results, I think it's best to expect that it may not. Therefore, something based chiefly on mail volume from a given sender seems the best approach: it leaves much less to chance.

> Nothing about the autoreply met any of the standard
> characteristics for vacation messages either, so that filter
> wasn't tripped.

Again, thinking of CONTENT: Other than that, scanning the body for "out of the
office", "try again in", "on vacation" etc. with scoring comes to mind for a
procmail-only solution.

That's exactly the problem (as I cited in the paragraph you replied to) - although there's a vacation checker in place, the message wasn't a vacation type message. Particularly bothersome was that the sender hadn't included any loop protections in the message they sent, AND did not say, cache the majordomo address to say "I've already sent a notice to that address", which would have stopped the problem dead in its tracks. The list processor doesn't have this luxury - it is there to respond to queries sent to it.

It wasn't a PYLM message either, which is another potential cause for this sort of flood problem which would be difficult to peg. We've had a few of those too, but nothing nearly so floodworthy.

This incident generated just shy of *7000* exchanges in 2.5 hours: one hour of free running before the unfortunate admin checked his email and noted he was being hammered, and 1.5 hours between when he sent up a flare and when I happened to be around to see it (he's in Australia, I'm in the US, and the list host is in Europe). He didn't have my pager details.

I set up a sendmail access db entry to refuse messages from the problem sender (domain), and purged matching entries from the mailqueue (to spare the admin the grief of as yet undelivered joy).

Thinking of rate-limiting POSTERS: It'd be nice to set anyone sending X
messages over a given period to "moderated" status on the list itself, no? This would be a list function though. I like to moderate new subscribers until they prove themselves, but there's not always that luxury.

Since these filters are implemented in a majordomo frontend (which I call Seneschal), that is certainly a possibility. Seneschal checks for digest-subject messages, loops (where detectable - such as external servers that ignore the SMTP envelope and attempt delivery to the To: field present on the list message), receipt requests, and a host of other things. Some are responded to with canned help text (such as why requesting a receipt on a list message is a bad idea), and others are just bounced off to the listadmin for review.

> There's the little matter of resetting the counter in a
> sensible fashion (if you merely purged the counter file on a
> weekly basis, then if there was a problem right around the
> purge time, it count go for nearly twice the threshold before
> being detected).

Just track the last 10 (50? 100?) posted messages to the list, and if many/most are from the same user, throttle them back?

Possibly. However on regular discussion lists, it's not uncommon for someone to sit down at their terminal and fire off a flurry of individual replies, which would spike their activity. Or, for a handful of people to handle the bulk of technical replies...

Or, in the case of a list command processor, for someone to be s*bscribed to multiple lists on a server and send commands associated with each of those (either the s*bscription command itself, or archive retrieval, etc).

Or just tail the last X entries on every update?

1. Extract the sender
2. Grep the "last X posters" file for that sender
3. If not found, tail the last X-1 to a temp file, otherwise queue for
moderation (?). Just don't send an automated message!!!
4. Append this sender
5. Rotate

Constant rotation of the file is undesireable.

An cron job every (short interval) could drop the least recent entries (tail
last X-1 to temp, rotate), ensuring someone talking to themselves on a
low-traffic list could still do so, slowly.

(A really ugly FIFO!) Locking issues abound.

Procmail includes "lockfile", a program to produce lockfiles in the same form that procmail is compiled for, in shell scripts.

> Checking for a
> flood would be as simple as taking the sender address and
> doing a wc on the datafile of that name - you're not running
> through a bunch of lines which don't have anything to do with
> the sender.

(This BTW, is the filename hash which Dallman uses for his greenlist or similar).

Though per-user rotation might be an issue if not done regulary (they'd flood
the list for at least one rotation window). You could use the short-interval
purging of last X entries here too of course, but you now have a BUNCH of files
to attend to as opposed to one. Does this gain you anything?

Speed and simplicity in the CHECKING stage, which is performed for each message through the processor (or, conceivably, list). But yea, for the size of the mailing list subscribership, this ends up being an ugly solution. OTOH, while there may be MANY members (26K+ in this case), only a few are regular participants, or will have been sending commands to the list processor, and the rotation function would be DELETING the files which contain only entries beyond the cutoff date, so the fileset would not likely be huge, unlike a more persistent greenlist, which grows and grows...

Thinking more based on CONTENT though...

This lmost sounds like a local razor candidate (cache md5sums of body) if

Which unfortunatley relies upon the body being identical. In this case, it was, but the first time some nimrod blows replies containing text from the previous exchange (or regular list-relayed message - this problem isn't limited to the list command processor, though that's where it blew up on someone), then the system will have failed to do its job.

content is identical. Otherwise, trap on the Subject: line (highly dependent on WHICH autoreply bot though) or bayes-train on auto-responder messages.

This particular instance had a subject which was clearly _supposed_ to have included a timestamp, but the bot on the far end didn't expand a variable token. So, the subjects _were_ static, but they weren't intended to be so.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>