At 14:08 2006-12-30 -0800, Professional Software Engineering wrote:
* 9876543210^0 ? grep -Z "$MATCH" ignore.cache
Bah, serves me right for looking at other recipes and trusting they're
doing as they claim. Get rid of that -Z -- it's for emitting an ascii-z
between filename results from grep, not using it as a line terminator on
the input file (and there, it is still not necessary, since we merely need
grep to confirm the string matched something in the file, not what
exactly). I realized this as soon as I was looking at man after posting
the message, but wanted to post my correction as a followup to my own post
on the list and it took forever to come through...
Looking back this seems like an appropriate approach:
grep -qF "$REFS" ignore.cache
Now, the trick is getting the list of ids into a single buffer and separate
them with newlines. If you're not averse to a shell and sed, you can
manage this easily enough, though it's overhead that each of your messages
will impose.
As for matching multiple messageid elements (from References: for
instance), if the cache file were newline delimited, you could use the -f
option to specify it to grep, but then you can't use formail to manage the
cache. Also, in my experience, grep can get rather memory and processor
intensive when using -f (so much so, that ages ago, I wrote a tool to load
a wordlist into an AVL tree and do my own simple pattern matching against
that, since I didn't really need the regular expression capabilities of
grep beyond case insensitivity and the like).
:0
* In-Reply-To:[ ]*\/.*
{
# Assign the results to REFS
REFS=${MATCH}
}
:0
* ^References:[ ]*\/.*
{
# Append the results to REFS
# no consideration as to whether REFS was null or not.
REFS="${REFS} ${MATCH}"
}
then, pipe that through tr and sed (following is all one line, the empty
doublequoted string is a space and tab):
REFSNL=`echo $REFS | tr -s " " "\n\n" | sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`
(I've always had troubles getting sed to invoke with embedded newlines from
procmail, never bothered to sort out why, so I just avoid trying -
otherwise, all this could be done without tr).
What the above does is replaces spaces and tabs with newlines, with tr also
condensing multiple outputs of the same character to one. Then sed deletes
anything not starting with an open angle bracket or not ending with a close
angle bracket, OR which is an empty line. $REFSNL now has newlines
separating each of the ids, with no blank entries, and none of the bogus
"In-Reply-To: Your message of ....", which believe it or not, has been on
messages posted to this very list in the past month.
This should address issues with multiple ids in the References or In-Reply-To:
So, the tweaked recipe looks like:
#============================================================================
# 20061230 SBS
# file away ignored threads, based on messageid references.
# Process:
# get ids from In-Reply-To and References, clean them up, then check to
# see if any of them are in the ignore cache or in the mua_ignore cache.
# If you set your MUA to invoke formail to cache ignored threads using the
# same ignore.cache file, then you can eliminate the additional grep
# invocation necessary to check the MUA specific one.
# if we have a match in the MUA id file or current cache, then ADD the
# messageid of THIS message to the cache so that replies to this message
# will also be ignored.
# ensure it's blank, not set to something you might have used it for
# previously
REFS=
:0
* In-Reply-To:[ ]*\/.*
{
# Assign the results to REFS
REFS=${MATCH}
}
:0
* ^References:[ ]*\/.*
{
# Append the results to REFS
# no consideration as to whether REFS was null or not.
REFS="${REFS} ${MATCH}"
}
# Scrub the references, placing each token on a separate line,
# eliminating blank lines and elements not matching the basic <bracketed>
# syntax of a messageid.
REFSNL=`echo $REFS | tr -s " " "\n\n" | sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`
:0 : ignore.cache.lock
* $REFSNL ?? .
* 9876543210^0 ? grep -qF "$REFSNL" ignore.cache
* 9876543210^0 ? grep -qF "$REFSNL" ignore.mua.file
{
# lockfile above already (which locked for the greps as well)
# 40KB is a lot of messages, but it's also a paltry size these days
# anyway. This might be about 800 messages worth of typical sized
# messageids.
:0Whc
| formail -D 40000 ignore.cache
# File this message away as irrelevant
# (I'm using mbx format)
:0:
irrelevant.threads
}
#============================================================================
Comments or improvements anyone?
Bugger, I've work to do, what am I doing here?
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail