procmail
[Top] [All Lists]

Re: Simple recipe to move uninteresting threads in separate mailbox

2006-12-30 19:24:22
At 14:08 2006-12-30 -0800, Professional Software Engineering wrote:
* 9876543210^0 ? grep -Z "$MATCH" ignore.cache

Bah, serves me right for looking at other recipes and trusting they're 
doing as they claim.  Get rid of that -Z -- it's for emitting an ascii-z 
between filename results from grep, not using it as a line terminator on 
the input file (and there, it is still not necessary, since we merely need 
grep to confirm the string matched something in the file, not what 
exactly).  I realized this as soon as I was looking at man after posting 
the message, but wanted to post my correction as a followup to my own post 
on the list and it took forever to come through...


Looking back this seems like an appropriate approach:
         grep -qF "$REFS" ignore.cache

Now, the trick is getting the list of ids into a single buffer and separate 
them with newlines.  If you're not averse to a shell and sed, you can 
manage this easily enough, though it's overhead that each of your messages 
will impose.

As for matching multiple messageid elements (from References: for 
instance), if the cache file were newline delimited, you could use the -f 
option to specify it to grep, but then you can't use formail to manage the 
cache.  Also, in my experience, grep can get rather memory and processor 
intensive when using -f (so much so, that ages ago, I wrote a tool to load 
a wordlist into an AVL tree and do my own simple pattern matching against 
that, since I didn't really need the regular expression capabilities of 
grep beyond case insensitivity and the like).


:0
* In-Reply-To:[         ]*\/.*
{
         # Assign the results to REFS
         REFS=${MATCH}
}

:0
* ^References:[         ]*\/.*
{
         # Append the results to REFS
         # no consideration as to whether REFS was null or not.
         REFS="${REFS} ${MATCH}"
}


then, pipe that through tr and sed (following is all one line, the empty 
doublequoted string is a space and tab):

REFSNL=`echo $REFS | tr -s "    " "\n\n" | sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`

(I've always had troubles getting sed to invoke with embedded newlines from 
procmail, never bothered to sort out why, so I just avoid trying - 
otherwise, all this could be done without tr).

What the above does is replaces spaces and tabs with newlines, with tr also 
condensing multiple outputs of the same character to one.  Then sed deletes 
anything not starting with an open angle bracket or not ending with a close 
angle bracket, OR which is an empty line.  $REFSNL now has newlines 
separating each of the ids, with no blank entries, and none of the bogus 
"In-Reply-To: Your message of ....", which believe it or not, has been on 
messages posted to this very list in the past month.

This should address issues with multiple ids in the References or In-Reply-To:


So, the tweaked recipe looks like:

#============================================================================
# 20061230 SBS

# file away ignored threads, based on messageid references.

# Process:
# get ids from In-Reply-To and References, clean them up, then check to
# see if any of them are in the ignore cache or in the mua_ignore cache.
# If you set your MUA to invoke formail to cache ignored threads using the
# same ignore.cache file, then you can eliminate the additional grep
# invocation necessary to check the MUA specific one.
# if we have a match in the MUA id file or current cache, then ADD the
# messageid of THIS message to the cache so that replies to this message
# will also be ignored.

# ensure it's blank, not set to something you might have used it for
# previously
REFS=

:0
* In-Reply-To:[         ]*\/.*
{
         # Assign the results to REFS
         REFS=${MATCH}
}

:0
* ^References:[         ]*\/.*
{
         # Append the results to REFS
         # no consideration as to whether REFS was null or not.
         REFS="${REFS} ${MATCH}"
}

# Scrub the references, placing each token on a separate line,
# eliminating blank lines and elements not matching the basic <bracketed>
# syntax of a messageid.
REFSNL=`echo $REFS | tr -s "    " "\n\n" | sed -e '/^\([^<].*\|.*[^>]\|\)$/ d'`

:0 : ignore.cache.lock
* $REFSNL ?? .
* 9876543210^0 ? grep -qF "$REFSNL" ignore.cache
* 9876543210^0 ? grep -qF "$REFSNL" ignore.mua.file
{
         # lockfile above already (which locked for the greps as well)
         # 40KB is a lot of messages, but it's also a paltry size these days
         # anyway.  This might be about 800 messages worth of typical sized
         # messageids.
         :0Whc
         | formail -D 40000 ignore.cache

         # File this message away as irrelevant
         # (I'm using mbx format)
         :0:
         irrelevant.threads
}

#============================================================================

Comments or improvements anyone?

Bugger, I've work to do, what am I doing here?

---
  Sean B. Straw / Professional Software Engineering

  Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
  Please DO NOT carbon me on list replies.  I'll get my copy from the list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail