At 18:02 2006-12-30 +0100, M. Fioretti wrote:
that file will be emptied every two weeks by a cron job.
Er, that seems unfortunate - the messages you'd deprecated the day before
the cron job won't have any effect on future messages because they'll be
purged because of an arbitrary 2 week cycle.
If the new entries are appended to the file, you might try using 'tail'
instead, so you're discarding the topmost (oldest) messages beyond some
keeper threshold, and thus the file doesn't grow to insane
proportions. Your script could even use 'wc' to determine if the file if
beyond a threshold necessitating such trimming in the first place.
If you manipulate the data file, you should probably use locking strategies
to allow for your MUA to append, your cron to shorten, and your procmail to
read and append. Logically, you can eliminate the cron entirely: as the
file is intended to affect how procmail handles new messages, you can have
procmail decide when to resize it.
I am trying to write a recipe that does this:
if (new message has In-Reply-To header with a Message-Id contained in
(with "grep -qFf" or similar systems which don't assume the
existance of databases, perl modules and so on)
append the Message-Id of the new message to .irrelevant_threads
save the new message only in $MAILDIR/irrelevant_list_threads
how would you write this recipe? Frankly, I've never tried something
so complex (for me of course) so I'd really appreciate your help here!
formail messageid cache comes to mind, but you'd need to diddle with header
names. I know I've written some recipes like this in the past (in response
to queries on this list), but they're not in my archive of test recipes, so
apparently I didn't tinker with them on my host.
Any possibility that the method you're using to have your MUA add the ID to
the file could be modified to use formail? If you did this, then formail
could manage your cache filesize automatically for you AND the replies to
those replies would also be ignored (which are threads you're not
interested in, right?). Otherwise, you need to be using References: as
well, not just In-Reply-To:.
While I don't hae a ready-made solution kicking around, I do have a recipe
which I wrote for someone to deal with redellivery of a fscked up mailbox
(some wannabe sysadm lunched mail delivery for their users and needed to
recover as much as they could). That involved grabbing the local SMTP ID
from the local mailhost and using that in place of the messageid which
formail cached. Here, we can use it to grab In-Reply-To tokens and do the
same thing. Basically, we take that token and pass it along in a COPY of
the headers of the message to formail to rewrite it and pass it along into
formail to cache it.
# get In-Reply-To messageid, and if that header isn't found, use References:
# then check to see if it is in the ignore cache or in the mua_ignore cache.
# formail stores cache with NUL terminations, and, best as I assume, your
# MUA is using NL, which is why the two grep invocations differ. Maximal
# matching is used so that if the first lookup succeeds, the second check is
# skipped. If you set your MUA to invoke formail to cache ignored threads,
# then you can use ONE file and can do away with the separate checks, which
# will be a LOT more efficient.
# if we have a match in the MUA id file or current cache, ADD the messageid
# of THIS message to the cache
* 9876543210^0 ^In-Reply-To: \/<[^>]*>
* 9876543210^0 ^References: \/<[^>]*>
# Do nothing - we just set $MATCH one of two ways above
:0 A: ignore.cache.lock
* 9876543210^0 ? grep -Z "$MATCH" ignore.cache
* 9876543210^0 ? grep "$MATCH" path_to_mua_idfile
# lockfile above already (which locked for the greps as well)
| formail -D 40000 ignore.cache
# if the preceeding conditions matched, then file this message
# away as irrelevant.
Note that we're not using the formail operation to _check_ the id database,
just to update it. Normally, if you use formail to check, it's still
_adding_ the current ID, and you'd use the return value to determine
whether the id was in there already. That's fine for dealing with
duplicate checking, not so useful for what you're trying to do.
I haven't subjected the above to extensive testing. One obvious issue will
be with MUAs which do in-reply-to MULTIPLE messages (treading the header
more like References:). The extraction of the id in In-Reply-To: (and
References:) is formed specifically to grab just the FIRST ID - this isn't
ideal (some iterative code would be necessary to get all of them), but it's
better than expecting to match multiple IDs on a single line in a single
lookup (which won't happen).
In fact, References: is another header you should be checking, and which I
added code above to handle. In my quick test of the above recipe with just
In-Reply-To:, I noted that two messages in a recent procmail list
discussion were not pulled aside - neither had an In-Reply-To, but instead
had References:. Updating the recipe to what you see above resulted in all
the messages in that (particular) discussion being identified.
Ultimatley, it'd be a LOT easier to simply write a C/C++ program to take
the In-Reply-To: and References:, combine them removing any dupes, then
scan a cache file, and if found, insert the Message-Id to that cache file
(after perhaps making sure it's not already there), returning a true/false
as to whether the message is related to an ignored thread. Whenever some
bozo writes a message out of the blue as a reply (rather than composing it
as a reply), or where people use crappy software, you can expect to have
messages that don't get caught no matter what you do. That's life.
Oh, did I mention that excepting for your MUA-managed file, the above setup
automatically deals with keeping the cache file limited to a reasonable
size. This is yet another reason to consider tweaking your MUA to emit to
the cache file via formail (or forwarding the message to a trivial procmail
recipe that invokes formail). Chances are, even if the overhead of adding
an initial message to the list from the MUA was a lot more CPU intensive
(and this really isn't that big of a hit), you'll still improve mail
processing runtimes (only one grep needed) and eliminate the cron (along
with the arbitrary cutoffs which it introduces).
0) I am aware that this will _also_ hide "new" threads made replying
to the last received message and changing the subject, and that's
fine with me
... probably because only morons piggyback new threads onto old ones, and
who wants to read the ramblings of a moron... <g>
1) I _have_ already found and read
and the corresponding thread in the procmail list archives, but I'll
confess I'm confused. Does it _really_ have to be so complicated?
Procmail doesn't have internal database features. There isn't an external
program all packaged up to do what YOU are trying to do (heck, even your
MUA doesn't do it...)
recipe "flow diagram" above is just one check and two consecutive
The first condition is one to match the specific list(s) the guy's filter
is supposed to operate on. He obviously did it as a separate rule so as to
keep the logic of the second rule (the "meat") simpler, otherwise adding
additional lists or posters could break the second rule. Technically, he
should have used maximal matching (score like 9876543210^0 instead of 1^0)
for the list id, which would allow him to have a lot of lists and it'd stop
checking the instant it matched to one.
Then, provided that the prior condition matched, he grabs the messageid,
and then truncates that and provided the From: addresses in those posts
match addresses of the guy he's killfiling, he echoes the messageid token
to a cache file.
The second indented rule checks for a filtercache file, and then greps the
ENTIRE HEADER against the lines in the filtercache file. Ugh.
the last level of indentation adds the current messageid into the
filtercache if and only if it isn't already in there.
he's got another bug in the script (probably from debugging it): the
verbose=on is commented out, but the verbose=off at the bottom is always
on. So under certain conditions (whenever the list matches), logging
verbosity will get turned off after this recipe runs.
a better way(short of implementing a PUSH/POP mechanism) would be to
preserve the existing verbosity level, then restore it:
actions if the check succeeds. Maybe I'm naive, but I was expecting
the recipe to be more or less the same length (3/4 lines). What am
I don't get understand. You think the whole thing should be accomplished
in 3 or 4 lines?
I'd say you owe me some beer, but as of yet, nobody has made good on that
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
procmail mailing list Procmail homepage: http://www.procmail.org/