procmail
[Top] [All Lists]

Re: Wanted Duplicate message delete

1997-06-05 08:46:00

Hi I have the a script in my procmail to delete duplicate messages 
But I think that  this uses the message id ( not the content ) 
Can anyone point me to a script that will delete duplicate messages 
based on there contents ??

I think this is highly un-doable.

So do so would require you so save the text of each message and then check
each incoming message against the previously stored messages.

Not only would this require a huge amount of disk space, it would also
require a huge amount of processing power to do a body comparison of each
message.....

I 'suppose' that you might be able to take the body and do an 'md5sum' on
it and then do that again for incoming messages, but that would definitely
be a lot of CPU power and I'm not sure how you'd go about keeping the log
at a sensible size (I suppose you could use the 'tail -1000 file >
file.new && mv file.new file method)

So maybe it is doable.... I'm not sure exactly what the steps are, but at
least maybe there's an idea... oh wait.... I think someone (Alan?) from
the group sent me this (easy way to tell: if it works, Alan did it... if
it doesn't, I must have messed something up ;-)

# make sure this exists
PROCDIR=${HOME}/.procmail/
 
# change the path below to your path for md5sum
:0b
SUM=|/usr/local/gnu/bin/md5sum
LOCKFILE=$PROCDIR/checksums.lock
 
:0Wih
| fgrep -s "$SUM" $PROCDIR/checksums
 
JUNK=`(tail -1000 $PROCDIR/checksums; \
       echo "$SUM")>$PROCDIR/checksums.new;\
       mv $PROCDIR/checksums.new $PROCDIR/checksums`
LOCKFILE
 
# end of recipe

Hope this gets you started...

TjL

-- 
TjL <luomat(_at_)peak(_dot_)org>   / http://www.peak.org/~luomat/next/ 
"The best things in life are made into inferior 
 versions and bundled with the latest Microsoft systems"
Bookmarks: http://www.peak.org/~luomat/next/bookmarks.html


<Prev in Thread] Current Thread [Next in Thread>