David, thanks for your excellent suggestions. I've already incorporated
most of them. Had a few more questions/comments though:
On Dec 11, 2:03pm, David W. Tamkin wrote:
Subject: Re: formail -D generates empty msg. when duplicate found?
Gary Funck wrote,
| What I see happening (v3.14, also 3.10) is that formail does detect
| duplicate messages. However, when it finds a duplicate, its ends
| up tradting it as an empty message which it prepends a
| 'From foo(_at_)bar' to and then passes this message onto procamil,
| and the script.rc script. I think it should just ignore the
| duplicate e-mail and move on. Is this a bug?
No, that's pretty much the way it works. The -f option (to formail, not to
procmail) would prevent that, but then you'd get procmail invoked on null
input.
Hmmm, ok, but I don't see the benefit of sending through the bogus
message, and the manpage seems to indicate otherwise:
-D maxlen idcache
Formail will detect if the Message-ID of the current
message has already been seen using an idcache file of
approximately maxlen size. If not splitting, it will
return success if a duplicate has been found. If
splitting, it will not output duplicate messages. If
used in conjunction with -r, formail will look at the
mail address of the envelope sender instead at the
Message-ID.
I'd read "not output a duplicate message" as meaning "nothing
will be sent to the output". Also, does a mail message with
only "From " line qualify as a valid mail message?
The problem with sending something through is that it will
confuse simple scripts like this:
formail -D 10000 id.cache -s echo . < mbox | wc -l
which attempts to count the number of unique messages, but
will actually count all messages, due to the dummy 'From '
being passed through.
| The workaround is simple, in script.rc, it check for, and discard
| the bogues message:
|
| #
| # if the message id is duplicated, formail passes in an
| # empty email body, with a 1 line "From foo(_at_)bar'. This
| # is a bug, I think - the workaround is to check, and
| # just dump the message. Hoever, this won't work well
| # for mal-formed mail boxes.
| #
| :0 D
| * ^From foo(_at_)bar
| /dev/null
As mentioned in a subsequent e-mail, the simple pattern match
above, didn't appear to work when applied to a message containing
only a '^From foo(_at_)bar' line - don't know why.