At 14:08 2006-12-31 +0100, M. Fioretti wrote:
The result of the very first run is below. The "extraneous locallock
file" warning is my fault, because I did not change the lock file name
according to the cache names I changed, right?
No, see below.
Apart from that, there is the fact that, when a message has both I-R-T
and References headers it gives a duplicate entry. This should be
solved, I think, piping the output of sed to uniq.
That's not totally necessary - you add the overhead of the additional call
to uniq (where in many cases it won't apply - I've certainly not seen a
plethors of In-Reply-To: *AND* References: in the same messages) in trade
for reducing the number of lookups when you hit the grep operation. If
there are duplicates, it won't cause a problem - if they're in the db,
it'll match on the first one and that's all you need it to do. OTOH,
there's no problem if you do add uniq to the REFSNL processor, it's just
extra CPU cycles, and yea, when the duplicate strings are NOT in the cache,
it'll speed up the grep somewhat. Your call. I optimized the id list in
favour of eliminating bogus matches (like looking for a newline, which
would match all lines).
procmail: Assigning "INCLUDERC=/home/marco/.procmail_irrelevant_threads"
I can't speak for others, but often a .procmail/ directory is where
individual rcfiles get placed - so you have ~/.procmailrc and ~/.procmail/
and all your included rcfiles are within the subdir, where they're not
hidden from directory views and not cluttering your home directory. It's a
lot easier to manage your procmail stuff if the bulk of it is segregated
from non-procmail files.
procmail: Match on "."
FTR, I didn't explain the condition that this is associated with, but all
that does is says "if REFSNL isn't EMPTY", since messages without
references needn't be processed by the rule.
procmail: Executing "grep,-qF,<45972349(_dot_)7040708(_at_)tacocat(_dot_)net>
<45972349(_dot_)7040708(_at_)tacocat(_dot_)net>,/home/marco/.procmail_ignore.cache"
grep: /home/marco/.procmail_ignore.cache: No such file or directory
procmail: Non-zero exitcode (2) from "grep"
Note that this isn't a problem - it's still _no_match_, and thus doesn't
need special dispensation to check for whether the cache file is there or not.
procmail: Extraneous locallockfile ignored
That's my fault, and I should know better. despite the fact that I want
the lockfile for ensuring that my READ access to the files (during grep)
isn't affected by an update (from mutt, or another concurrent procmail
invocation), the lockfile won't do that because it expects to occur only if
there's a delivery type action for this recipe - there isn't at this level,
only a braced action. Some reorg of the recipe (which results in a
streamlined action anyway) solves this (retaining the bracing would require
use of the LOCKFILE pseudo-variable, which is unwieldy). I've got a
rewrite at the bottom of this message.
Although I expect you might be coming around to the idea of just using
formail to manage the id cache from your MUA as well, in case you don't,
note that there's a further rcfile simplification: condense the two grep
operations to one line and eliminate the scoring:
* ? grep -qF "$REFSNL" ignore*.cache
grep will only end up searching those cache files which exist
(ignore.cache, ignore.mua.cache - the latter of which is renamed here to
make it easier to match with a single focused wildcard name). So, besides
simplifying the rcfile makeup, in the event that the ignore.cache doesn't
exist (which, well, should really only happen when you haven't yet tagged
anything), you don't end up with TWO invocations of grep - it only runs
once either way.
Note that providing two separate filenames on the commandline will cause
grep to bail if one of them doesn't actually exist. The wildcard gets
around that, because grep is only seeing the filenames which do exist.
So, you have the log output from one run, apparently for a message that
wasn't matching against something in your MUA hitlist. Seems like you'd
want to see it in action.
here's a further revision:
# simple recipe to ignore threads based on prior cache of threads to ignore.
# 20061230, SBS
# get In-Reply-To messageid, check to see if it is in the ignore cache or
# in the mua_ignore cache. formail stores cache with ascii-z terminations,
# but grep will still match the binary file.
# if we have a match in the MUA id file or current cache, ADD the messageid
# of THIS message to the cache, so that replies to it will also be ignored.
# ensure these are blank, not set to something you might have used them for
# previously
REFS=
REFSNL=
:0
* In-Reply-To:.*\/[^ ].*
{
# Assign the results to REFS
REFS=${MATCH}
}
:0
* ^References:.*\/[^ ].*
{
# Append the results to REFS
# no consideration as to whether REFS was null or not.
REFS="${REFS} ${MATCH}"
}
# by doing this ONLY if REFS contains non-whitespace, we spare
# ourselves the overhead of the pipe chain invocation when it isn't
# needed (i.e. messages with no references). Arguably, REFS shouldn't
# be set at all if the headers are empty, but this check is cheap to perform
:0
* REFS ?? [^ ]
{
REFSNL=`echo "$REFS" | tr -s " " "\n\n" | \
sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`
}
:0hc:ignore.cache$LOCKEXT
* REFSNL ?? .
* ? grep -qF "$REFSNL" ignore*.cache
| formail -D 40000 ignore.cache
# if the preceeding conditions matched, then file this message
# away as irrelevant.
:0A:
irrelevant.threads
That condition for invocation of REFSNL shows as follows in the verbose log:
procmail: No match on "In-Reply-To:.*\/[^ ].*"
procmail: No match on "^References:.*\/[^ ].*"
procmail: No match on "[^ ]"
procmail: No match on "."
Basically, no references, no impact of real work. Since there's a fair
number of originating (i.e. not followup) messages, this is a good thing.
A few things to ponder:
1. If prior to this recipe, you had list identification rules, which set a
variable for the listname but didn't actually deliver, you could employ
that in determining the filename for the irrelevant thread file - i.e.
having list-specific files.
2. Since cc'd messages will bear the same headers as the listbound copy, if
you're cc'd on a thread which you're ignoring, you'll be ditchihg it
here. You may want to add some logic prior to this ruleset which takes
direct cleartext addressed correspondance and delivers it accordingly.
3. If at a later date, you reprocess your mailbox or irrelevant.threads
files, actions will be different, as the cache will be in a different state
(the same holds true if you were using a cron cycled killfile). If the
same cache is in effect, then things will still be filtered - but as ids
dissappear from the cache, so to will their effect on other replies. Just
something to keep in mind.
4. Based on the messageid cache mechanism for ignoring, there's no reason
that someone using a non-shell mailer can't set up a forward rule and
forward key messages to ignore to themselves with a special key to trigger
a recipe to grab the messageid from the header and add it to the
cache. Much like so:
# grab messageid from body (i.e. forwarded with headers context)
:0b:ignore.cache$LOCKEXT
* From: expression-matching-thyself
* Subject: ignore THIS keyword
| formail -D 40000 ignore.cache
This should present the basic mechanism for accomplishing the task, though
you should add whatever you see fit to ensure this isn't something that can
be arbitrarily manipulated by some passerby on the net.
Now, after all that, let me wish you a new year with fewer trolls and
nonsensical threads needing to be filtered out <g>
So, where's my beer? I accept drop shipments from the UK, home of porters
and cream stouts. <g>
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail