procmail
[Top] [All Lists]

Re: Wanted Duplicate message delete

1997-06-06 01:39:00
On Thu, 5 Jun 1997 07:49:21 -0700 (PDT),
Timothy Luoma <luomat(_at_)peak(_dot_)org> wrote:
Hi I have the a script in my procmail to delete duplicate messages 
But I think that  this uses the message id ( not the content ) 
Can anyone point me to a script that will delete duplicate messages 
based on there contents ??
I think this is highly un-doable.
So do so would require you so save the text of each message and then check
each incoming message against the previously stored messages.

No, it just depends a lot on what you mean by "duplicate". The real
problem is that duplicate messages are practically never +exact+
duplicates so you have to devise a way to canonicalize them to some
extent before you compare, and decide how to deal with accidental
matches. If you can do that, something like an MD5 digest of each
received message should be all you need to store.
  (A good approximative approach might be to keep only the very
essential headers and remove whitespace, quoting, and MIME stuff. If
you have a Resent-Date or Old-Date or something, try that as well as
the real Date header. If you get a duplicate and the date is older
than a few hours, assume it's real. This is off the top of my head.
Nothing is easy.)

least maybe there's an idea... oh wait.... I think someone (Alan?) from
the group sent me this (easy way to tell: if it works, Alan did it... if
it doesn't, I must have messed something up ;-)

This is a fairly good start, but you might want to tweak it. (If
anybody has been using this in real life, reports would be much
appreciated.) 

# make sure this exists
PROCDIR=${HOME}/.procmail/

# change the path below to your path for md5sum
:0b
SUM=|/usr/local/gnu/bin/md5sum
LOCKFILE=$PROCDIR/checksums.lock

:0Wih
| fgrep -s "$SUM" $PROCDIR/checksums

JUNK=`(tail -1000 $PROCDIR/checksums; \
       echo "$SUM")>$PROCDIR/checksums.new;\
       mv $PROCDIR/checksums.new $PROCDIR/checksums`
LOCKFILE

# end of recipe

/* era */

-- 
Defin-i-t-e-ly. Sep-a-r-a-te. Gram-m-a-r.  <http://www.iki.fi/~era/>
 * Enjoy receiving spam? Register at <http://www.iki.fi/~era/spam.html>

<Prev in Thread] Current Thread [Next in Thread>