Burnt Norton <bnorton(_at_)mastaler(_dot_)com> wrote:
On Fri, Sep 05, 1997 at 12:57:02PM -0700, Alan K. Stebbens wrote:
This is the reason why I rewrote "dupcheck.rc": to provide for a
complete duplicate filtering solution, without worrying about mail
I don't think that an identical message body necessarily constitues a
"duplicate" message. Symantically it does sure, but not practically.
Then you filter that seperately or you lose. It is that simple.
The vast majority of the time when I get duplicate messages they
are spam. I deal with duplicate messages by message ID first, then
everything not to some special addresses of mine, not from me, and
not from the Internet Oracle get put through the following duplicate
tests: a repeat from a sliding window of 700 messages (about a week
when I wrote the filter, less now) of checking for duplicate word-
count-in-body/From:/Subject:; a repeat from a sliding window of 10
messages for dupicate byte-count-in-body/Subject:. Then I apply some
even more dubious ("loose") checks on stuff for the group I moderate.
First they are checked against a sliding window of 50 messages for
dupicates of sending-news-server/words-in-body; and against a sliding
window of 700 messages for duplicate lines-in-body/From:/Subject:.
The only problems I have had with this in the last month (since I do
not delete any of the mail, just shunt it off to the side, it is easy
to check) is that a certain class of bounce message was getting filed
as duplicates (each had the same body except for the email address
involved) and one case of someone making the same post twice, the
second time corrected not to have each paragraph as one long line.
whose normal remailer for posting to the list is momentarily broken