procmail
[Top] [All Lists]

Re: duplicates and delivery problems

1997-09-05 00:55:38
On Thu, 4 Sep 1997 22:19:14 -0600, Burnt Norton 
<bnorton(_at_)mastaler(_dot_)com>
wrote:
     Beware if you have delivery problems in recipes  below  this
     one and procmail tries to requeue the mail, then on the next
     queue run, this mail will be considered a duplicate and will
     be  thrown  away.  For those not quite so confident in their
     own scripting capabilities, you can use the following recipe
     instead.  It puts duplicates in a separate folder instead of
     throwing them away.  It is up to you to  periodically  empty
     the folder of course.
I don't want to put duplicates into a seperate folder, I'd rather
delete them.  At the same time however, I'm worried that a mail
might be incorrectly deleted because of the delivery problem /
requeue situation described above.
Is there a way to safegaurd against this so I can throw away
dupes with confidence that only dupes are being deleted?  

Uh ... not very easily. I'm not going to implement it, but you could
play around with doing something in TRAP to tell coming generations
that you did a successful delivery. Or you could have a cron job look
for Message-Id:s that are nowhere among your saved messages and fish
them out of the duplicates folder. Or you could periodically
regenerate msgid.cache from the messages you have already in your
spool files, and perhaps rerun the backup file of tentative duplicates
through Procmail immediately after. Or you could write something to
the log every time you deliver successfully, and periodically extract
information about successful deliveries and delete those from the
"possible dupes" queue. Or you could defer checking of duplicates
until you are just about to deliver a message, and then prior to
delivery look at what's already in the file, and if you see a
duplicate Message-Id at this point, ditch it.

No matter which way you turn, fault-tolerant duplicate detection is a
bit outside the scope of what Procmail was actually made for. If you
get really serious about this, you might consider writing a patched
Procmail which does some extra hooks immediately prior to delivering a
message. (Coming to think of it, this could be a useful extension in
its own right.)

Also, how high can I safely set the "maxlen" argument to 
formail -D to?  8192 seems too low to me, and we have plenty
of disk space here.  Thanks in advance.

I haven't really looked at the formail code, but it seems to save
these Message-Id:s in a fairly compact format (i.e. very little
overhead, it's basically little more than the Message-Id:s themselves.
In a quickie test on a few thousand messages, I found the average
Message-Id length to be about XXX characters, so you could squeeze in
quite a number of them) but I suspect this relatively low number is
really more a time optimization than a space optimization. 

/* era */

Count the a:s in "separate".

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>