procmail
[Top] [All Lists]

Re: Duplicates?

2004-05-18 10:49:40
mrpierce wrote:

<> Q1. Trying to avoid email duplication and I seen several recipe showing
<> what to do but I'm uncertain as to which is correct and could use some
<> advice (my version of procmail is 3.22-9):
<> 
<> I'm using this:
<> #DUPLICATE MAILS
<> :0 Wh msgid.lock
<> | formail -D 8192 msgid.cache
<> :0 a
<> {
<>   LOG="ACCEPTED MAIL (duplicates rule) - "
<>   :0
<>     duplicates/
<> }
<> 
<> but my log says:
<>      procmail: Skipped "msgid.lock"

Because in your first recipe you haven't told procmail to use the lock

  :0 Wh: msgid.lock
  | formail -D 8192 msgid.cache

(Note the second colon)

<> I've seen these:
<> # Remove duplicate mails (older formail)
<> :0 Wh: msgid.lock 
<> | (/bin/sed -e 's/^Message-ID:  /Message-Id: /') | 
<> /usr/local/bin/formail \
<> -D 8192 $PMDIR/msgid.cache
<> # traps duplicateg message ids after stabilizing
<> # case and leading white spaces. :(

Formail is part of procmail; your version is +well+ new enough to not
need the extra sed process to manipulate the Message-Id line.

(I've been using procmail for a decade and I don't remember ever having
to work around Message-Id ... how old is that recipe ...?)

<> # Remove duplicate mails (newer formail)
<> :0 Wh: duplicates.lock
<> * ?formail -D 65536 msgid.cache
<> duplicates
<> # traps duplicate message ids in a 64kb cache regardless 
<> # of case or leading white spaces. :-)

This combines the two recipes into one and uses a larger cache.  I
think this one is "riskier" as it doesn't protect the cache file from
multiple writers, just the mailbox.

The point of the lock is to prevent two formail processes from writing
a message id entry to the cache simultaneously, thus intermingling the
characters.  In order to do that, the calls to formail must be serialized
with the lockfile, not just the writes to the mbox file.

<> # --- handle duplicate email ---
<> # 
<> MESG_ID=mesg_id.cache
<> :0 Whc: $MESG_ID.lock
<> | formail -D 196608 $MESG_ID
<>      :0 a:
<>      IN.duplicates

Same as yours except using a variable to hold the lockfile name and a
huge cache for M-Ids.  

Reto
-- 
R A Lichtensteiger      rali (at) tifosi.com

boss, n:
        According to the Oxford English Dictionary, in the Middle Ages
        the words "boss" and "botch" were largely synonymous, except
        that boss, in addition to meaning "a supervisor of workers" also
        meant "an ornamental stud."

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>