procmail
[Top] [All Lists]

Re: Alleviating Duplicates

1999-05-07 16:57:30
Daniel Munden wrote,

| Below is a standard process that I use to eliminate duplicates. It
| works, but there are other messages filtering through duplicating
| other prior messages. The duplicate messages seem as if they are
| another attempt by the same person to send a second copy of the first
| message, but it happens often and I need a proven process to weed out
| those additional messages.
| 
| :0 Wh: msgid.lock
| | $FORMAIL -D 8192 $PMDIR/msgid.cache

That recipe maintains a cache of Message-Id: values and checks for duplicates
in that field.  If a broken mailer keeps reusing the same ID, all mail from
it (after the first message from it) looks like duplicates, as we've dis-
cussed before.  Daniel has the converse problem: true duplicate texts are
getting sent to him with different IDs, so formail -D sees that they are new
mail -- which, as far as that goes, they are -- even though their texts say
the same old thing.  Sometimes a person thinks a message wasn't dispatched
and sends it again, or sometimes a mail transport burps.  Or sometimes you're
getting hit with copies of the same spam from several different addresses or
to several seemingly distinct addresses that all lead to your mailbox.

There has been a little talk here in the past about using checksum programs
on the bodies of incoming email and keeping caches of the checksums.  The
shortcoming there is that a trivial change to the body can affect the
checksum and that potentially two very different messages can generate the
same checksum.

The only reliable way it so accept them all, glance at them, and use your own
"I've already seen this" recognition to decide what is a duplicate.  That
doesn't save any labor, I know.  Sorry.

<Prev in Thread] Current Thread [Next in Thread>