procmail
[Top] [All Lists]

Re: Alleviating Duplicates

1999-05-09 14:07:53
At 06:35 PM 5/8/99 +0000, Bennett Todd wrote:
An idea just hit me. Since the "formail -D" trick works find for many people,
this idea could also be implemented as a separate standalone program, with
similar behavior.

Suppose we pursued the "fingerprint" trick to an extreme. I'm thinking, toss
the entire header, then start scanning through the body, ignoring whitespace
and filler words, for each substantive word, generate the soundex code;
accumulate say the first 16 soundex codes. Pad with a null value (say 0000)
for short messages that don't produce 16 soundex code worth of body.

The start of the message is probably a bad place to start for anything
useful; it's not uncommon to see a canned intro to changing data (such as
a list digest).

Also, note my quoting of you above... if several did this, your filter
would toss all but the first as "duplicates."

Almost any partial body hash would have to do something about quoting,
I think.

Cheers,
Stan

<Prev in Thread] Current Thread [Next in Thread>