procmail
[Top] [All Lists]

Re: duplicates and delivery problems

1997-09-05 10:14:25
W. Wesley Groleau x4923 writes:
One thing I wish I knew how to do is detect messages where the _bodies_
have duplicate content, but came through list servers that changed the
message ID and perhaps tack on a trailer.

Ugly maybe expensive idea -- hash the body (maybe do something to
delete leading & trailing "junk", such as added mime wrappers).
Use SHA or MD5.  Save with an associated date (possibly in a
db scheme, for extra credit).   Check hash against a list of previously
seen hashes, if matches, count as a duplicate, otherwise add
hash to list.  Periodically flush hash list.

Maybe formail could be modified to do this & manage it like it can
manage the message-id duplicates.