Re: Flood control?

At 10:20 2004-02-26 -0500, Bob George wrote:

Thinking about trapping based on CONTENT: Is the body of the message the same
in each case? Or does it vary, include the original message or something
similar? (more on content below).

In this particular case, it was, but on the offchance that someone wouldreply with a copy of the majordomo command process results, I think it'sbest to expect that it may not. Therefore, something based chiefly on mailvolume from a given sender seems the best approach: it leaves much less tochance.

> Nothing about the autoreply met any of the standard
> characteristics for vacation messages either, so that filter
> wasn't tripped.

Again, thinking of CONTENT: Other than that, scanning the body for "out of the
office", "try again in", "on vacation" etc. with scoring comes to mind for a
procmail-only solution.

That's exactly the problem (as I cited in the paragraph you replied to) -although there's a vacation checker in place, the message wasn't a vacationtype message. Particularly bothersome was that the sender hadn't includedany loop protections in the message they sent, AND did not say, cache themajordomo address to say "I've already sent a notice to that address",which would have stopped the problem dead in its tracks. The listprocessor doesn't have this luxury - it is there to respond to queries sentto it.

It wasn't a PYLM message either, which is another potential cause for thissort of flood problem which would be difficult to peg. We've had a few ofthose too, but nothing nearly so floodworthy.

This incident generated just shy of *7000* exchanges in 2.5 hours: one hourof free running before the unfortunate admin checked his email and noted hewas being hammered, and 1.5 hours between when he sent up a flare and whenI happened to be around to see it (he's in Australia, I'm in the US, andthe list host is in Europe). He didn't have my pager details.

I set up a sendmail access db entry to refuse messages from the problemsender (domain), and purged matching entries from the mailqueue (to sparethe admin the grief of as yet undelivered joy).

Thinking of rate-limiting POSTERS: It'd be nice to set anyone sending X
messages over a given period to "moderated" status on the list itself, no?This would be a list function though. I like to moderate new subscribersuntil they prove themselves, but there's not always that luxury.

Since these filters are implemented in a majordomo frontend (which I callSeneschal), that is certainly a possibility. Seneschal checks fordigest-subject messages, loops (where detectable - such as external serversthat ignore the SMTP envelope and attempt delivery to the To: field presenton the list message), receipt requests, and a host of other things. Someare responded to with canned help text (such as why requesting a receipt ona list message is a bad idea), and others are just bounced off to thelistadmin for review.

> There's the little matter of resetting the counter in a
> sensible fashion (if you merely purged the counter file on a
> weekly basis, then if there was a problem right around the
> purge time, it count go for nearly twice the threshold before
> being detected).
Just track the last 10 (50? 100?) posted messages to the list, and ifmany/most are from the same user, throttle them back?

Possibly. However on regular discussion lists, it's not uncommon forsomeone to sit down at their terminal and fire off a flurry of individualreplies, which would spike their activity. Or, for a handful of people tohandle the bulk of technical replies...

Or, in the case of a list command processor, for someone to be s*bscribedto multiple lists on a server and send commands associated with each ofthose (either the s*bscription command itself, or archive retrieval, etc).

Or just tail the last X entries on every update?

1. Extract the sender
2. Grep the "last X posters" file for that sender
3. If not found, tail the last X-1 to a temp file, otherwise queue for
moderation (?). Just don't send an automated message!!!
4. Append this sender
5. Rotate


Constant rotation of the file is undesireable.

An cron job every (short interval) could drop the least recent entries (tail
last X-1 to temp, rotate), ensuring someone talking to themselves on a
low-traffic list could still do so, slowly.

(A really ugly FIFO!) Locking issues abound.

Procmail includes "lockfile", a program to produce lockfiles in the sameform that procmail is compiled for, in shell scripts.

> Checking for a
> flood would be as simple as taking the sender address and
> doing a wc on the datafile of that name - you're not running
> through a bunch of lines which don't have anything to do with
> the sender.

(This BTW, is the filename hash which Dallman uses for his greenlist orsimilar).

Though per-user rotation might be an issue if not done regulary (they'd flood
the list for at least one rotation window). You could use the short-interval

purging of last X entries here too of course, but you now have a BUNCH offiles

to attend to as opposed to one. Does this gain you anything?

Speed and simplicity in the CHECKING stage, which is performed for eachmessage through the processor (or, conceivably, list). But yea, for thesize of the mailing list subscribership, this ends up being an uglysolution. OTOH, while there may be MANY members (26K+ in this case), onlya few are regular participants, or will have been sending commands to thelist processor, and the rotation function would be DELETING the files whichcontain only entries beyond the cutoff date, so the fileset would notlikely be huge, unlike a more persistent greenlist, which grows and grows...

Thinking more based on CONTENT though...

This lmost sounds like a local razor candidate (cache md5sums of body) if

Which unfortunatley relies upon the body being identical. In this case, itwas, but the first time some nimrod blows replies containing text from theprevious exchange (or regular list-relayed message - this problem isn'tlimited to the list command processor, though that's where it blew up onsomeone), then the system will have failed to do its job.

content is identical. Otherwise, trap on the Subject: line (highlydependent on WHICH autoreply bot though) or bayes-train on auto-respondermessages.

This particular instance had a subject which was clearly _supposed_ to haveincluded a timestamp, but the bot on the far end didn't expand a variabletoken. So, the subjects _were_ static, but they weren't intended to be so.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail