procmail
[Top] [All Lists]

Re: removing duplicates based upon an excerpt from the msg. body

2006-02-10 23:08:10
At 18:58 2006-02-10 -0800, Gary Funck wrote:

Comments? Suggested improvements?

Yea, pipe the head of the message body to, cksum, md5sum, or sum:

         cksum -

This will produce a much shorter signature for you.  You could 'stack' 
cksum and sum signatures (they use different algorithms) so as to have a 
longer unique string.  No need to translate characters in the original 
message (though you might want to translate spaces in the cksum output:

         cksum - | tr ' ' '_'

After getting the cksum, you could append the email address of the author, 
which would form a more typical looking messageid while also clearly 
indicating who that message was posted by.  With the shrinkage in id 
length, your history will increase by five or ten fold with typical 
addresses, and you can certainly match against a larger proportion of the 
message (though with discussion lists, there's always the issue with 
list-inserted footers, which will generate uniqueness).

(btw, summing the sorted output of ls is sometimes useful for having a 
script tell if the contents of a directory changed (incl. deletions) since 
a prior run)

---
  Sean B. Straw / Professional Software Engineering

  Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
  Please DO NOT carbon me on list replies.  I'll get my copy from the list.


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail