procmail
[Top] [All Lists]

Re: Alleviating Duplicates

1999-05-07 17:34:11
At 18:38 1999-05-07 -0500, David W. Tamkin wrote:

[snip]

There has been a little talk here in the past about using checksum programs
on the bodies of incoming email and keeping caches of the checksums.  The
shortcoming there is that a trivial change to the body can affect the
checksum and that potentially two very different messages can generate the
same checksum.

There is another mechanism:  CRCs.  Takes a little more horsepower to
generate (not as trivial as a simple checksum, but hardly a CPU hog
either).  Take the CRC of the body and store it into one field, the length
of the body into another field, and the CRC of the subject and store it
into yet another.  CRCs of other select header elements could be used as
well (such as From, and of course, the Message-ID).

There is no real escaping the fact that a minour change to the body can
affect a signature.  One _possible_ mechanism to reduce the chances that a
simple reformat of the text would cause a mismatch would be to have the
signature generator IGNORE whitespace (tabs, spaces, newlines), and
probably quotation markers as well (although attribution headers would pose
a unique challenge) - thus forwarded copies without additional commentary
_could_ be classified as duplicate messages if there is nothing in them to
truly differentiate the "beef" of the message from another copy you've
already received.

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395

<Prev in Thread] Current Thread [Next in Thread>