Re: Flood control?
At 10:20 2004-02-26 -0500, Bob George wrote:
Thinking about trapping based on CONTENT: Is the body of the message the same
in each case? Or does it vary, include the original message or something
similar? (more on content below).
In this particular case, it was, but on the offchance that someone would
reply with a copy of the majordomo command process results, I think it's
best to expect that it may not. Therefore, something based chiefly on mail
volume from a given sender seems the best approach: it leaves much less to
> Nothing about the autoreply met any of the standard
> characteristics for vacation messages either, so that filter
> wasn't tripped.
Again, thinking of CONTENT: Other than that, scanning the body for "out of the
office", "try again in", "on vacation" etc. with scoring comes to mind for a
That's exactly the problem (as I cited in the paragraph you replied to) -
although there's a vacation checker in place, the message wasn't a vacation
type message. Particularly bothersome was that the sender hadn't included
any loop protections in the message they sent, AND did not say, cache the
majordomo address to say "I've already sent a notice to that address",
which would have stopped the problem dead in its tracks. The list
processor doesn't have this luxury - it is there to respond to queries sent
It wasn't a PYLM message either, which is another potential cause for this
sort of flood problem which would be difficult to peg. We've had a few of
those too, but nothing nearly so floodworthy.
This incident generated just shy of *7000* exchanges in 2.5 hours: one hour
of free running before the unfortunate admin checked his email and noted he
was being hammered, and 1.5 hours between when he sent up a flare and when
I happened to be around to see it (he's in Australia, I'm in the US, and
the list host is in Europe). He didn't have my pager details.
I set up a sendmail access db entry to refuse messages from the problem
sender (domain), and purged matching entries from the mailqueue (to spare
the admin the grief of as yet undelivered joy).
Thinking of rate-limiting POSTERS: It'd be nice to set anyone sending X
messages over a given period to "moderated" status on the list itself, no?
This would be a list function though. I like to moderate new subscribers
until they prove themselves, but there's not always that luxury.
Since these filters are implemented in a majordomo frontend (which I call
Seneschal), that is certainly a possibility. Seneschal checks for
digest-subject messages, loops (where detectable - such as external servers
that ignore the SMTP envelope and attempt delivery to the To: field present
on the list message), receipt requests, and a host of other things. Some
are responded to with canned help text (such as why requesting a receipt on
a list message is a bad idea), and others are just bounced off to the
listadmin for review.
> There's the little matter of resetting the counter in a
> sensible fashion (if you merely purged the counter file on a
> weekly basis, then if there was a problem right around the
> purge time, it count go for nearly twice the threshold before
> being detected).
Just track the last 10 (50? 100?) posted messages to the list, and if
many/most are from the same user, throttle them back?
Possibly. However on regular discussion lists, it's not uncommon for
someone to sit down at their terminal and fire off a flurry of individual
replies, which would spike their activity. Or, for a handful of people to
handle the bulk of technical replies...
Or, in the case of a list command processor, for someone to be s*bscribed
to multiple lists on a server and send commands associated with each of
those (either the s*bscription command itself, or archive retrieval, etc).
Or just tail the last X entries on every update?
1. Extract the sender
2. Grep the "last X posters" file for that sender
3. If not found, tail the last X-1 to a temp file, otherwise queue for
moderation (?). Just don't send an automated message!!!
4. Append this sender
Constant rotation of the file is undesireable.
An cron job every (short interval) could drop the least recent entries (tail
last X-1 to temp, rotate), ensuring someone talking to themselves on a
low-traffic list could still do so, slowly.
(A really ugly FIFO!) Locking issues abound.
Procmail includes "lockfile", a program to produce lockfiles in the same
form that procmail is compiled for, in shell scripts.
> Checking for a
> flood would be as simple as taking the sender address and
> doing a wc on the datafile of that name - you're not running
> through a bunch of lines which don't have anything to do with
> the sender.
(This BTW, is the filename hash which Dallman uses for his greenlist or
Though per-user rotation might be an issue if not done regulary (they'd flood
the list for at least one rotation window). You could use the short-interval
purging of last X entries here too of course, but you now have a BUNCH of
to attend to as opposed to one. Does this gain you anything?
Speed and simplicity in the CHECKING stage, which is performed for each
message through the processor (or, conceivably, list). But yea, for the
size of the mailing list subscribership, this ends up being an ugly
solution. OTOH, while there may be MANY members (26K+ in this case), only
a few are regular participants, or will have been sending commands to the
list processor, and the rotation function would be DELETING the files which
contain only entries beyond the cutoff date, so the fileset would not
likely be huge, unlike a more persistent greenlist, which grows and grows...
Thinking more based on CONTENT though...
This lmost sounds like a local razor candidate (cache md5sums of body) if
Which unfortunatley relies upon the body being identical. In this case, it
was, but the first time some nimrod blows replies containing text from the
previous exchange (or regular list-relayed message - this problem isn't
limited to the list command processor, though that's where it blew up on
someone), then the system will have failed to do its job.
content is identical. Otherwise, trap on the Subject: line (highly
dependent on WHICH autoreply bot though) or bayes-train on auto-responder
This particular instance had a subject which was clearly _supposed_ to have
included a timestamp, but the bot on the far end didn't expand a variable
token. So, the subjects _were_ static, but they weren't intended to be so.
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
procmail mailing list