procmail
[Top] [All Lists]

Re: Alleviating Duplicates

1999-05-08 00:59:47
On 7 May 1999, SoloCDM <deedsmis(_at_)ris(_dot_)net> wrote:
"David W. Tamkin" wrote:

Daniel Munden wrote,

| Below is a standard process that I use to eliminate duplicates. It
| works, but there are other messages filtering through duplicating
| other prior messages. The duplicate messages seem as if they are
| another attempt by the same person to send a second copy of the
| first message, but it happens often and I need a proven process to
| weed out those additional messages.
|
| :0 Wh: msgid.lock
| | $FORMAIL -D 8192 $PMDIR/msgid.cache

That recipe maintains a cache of Message-Id: values and checks for
duplicates in that field.  If a broken mailer keeps reusing the same
ID, all mail from it (after the first message from it) looks like
duplicates, as we've dis- cussed before.  Daniel has the converse
problem: true duplicate texts are getting sent to him with different
IDs, so formail -D sees that they are new mail -- which, as far as
that goes, they are -- even though their texts say the same old
thing.
[...]
What about processing every line in the body? To us that may seem
slow, but CPU time is nothing, although it does require more CPU
time and a data file to hold the information. And, high volume mail
processing servers wouldn't use it.

    Simply put, that would make delivery about 10000 times slower,
regardless if you're running a small server or a big one.  Would this be
acceptable for any practical purpose?

    If you decide it is, I'd suggest storing your messages in an SQL
database, headers separate from bodies, and delegating duplicate removal
to the SQL server.  Chances are it will do a much better job than any
hack in procmail.  Using an SQL database might actually prove useful for
other purposes too; the only tricky part is convincing your MUA to read
messages from the database instead of mailboxes.

Signed,
Daniel D. Munden

P.'S. Detailed Documentation(s) and Sample(s) are more than welcome.
  ^^^^^

    Ok, people have posted short poems to this list in the (relatively
recent) past, so I'll indulge myself in commenting on my own pet peeve:
AFAIK, "P.S." stands for the Latin "post scriptum", which means "after
signature".  If you have another meaning in mind, you should document it
in detail. :-)

    Regards,

    Liviu Daia

-- 
Dr. Liviu Daia               e-mail:   Liviu(_dot_)Daia(_at_)imar(_dot_)ro
Institute of Mathematics     web page: http://www.imar.ro/~daia
of the Romanian Academy      PGP key:  http://www.imar.ro/~daia/daia.asc

<Prev in Thread] Current Thread [Next in Thread>