procmail
[Top] [All Lists]

Re: formail -D generates empty msg. when duplicate found?

1999-12-11 14:28:20
David, thanks for your excellent suggestions.  I've already incorporated
most of them.  Had a few more questions/comments though:

On Dec 11,  2:03pm, David W. Tamkin wrote:
Subject: Re: formail -D generates empty msg. when duplicate found?
Gary Funck wrote,

| What I see happening (v3.14, also 3.10) is that formail does detect
| duplicate messages.  However, when it finds a duplicate, its ends
| up tradting it as an empty message which it prepends a
| 'From foo(_at_)bar' to and then passes this message onto procamil,
| and the script.rc script.  I think it should just ignore the
| duplicate e-mail and move on.   Is this a bug?

No, that's pretty much the way it works.  The -f option (to formail, not to
procmail) would prevent that, but then you'd get procmail invoked on null
input.

Hmmm, ok, but I don't see the benefit of sending through the bogus
message, and the manpage seems to indicate otherwise:

      -D maxlen idcache
           Formail will detect if the Message-ID  of  the  current
           message  has already been seen using an idcache file of
           approximately maxlen size.  If not splitting,  it  will
           return  success  if  a  duplicate  has  been found.  If
           splitting, it will not output duplicate  messages.   If
           used  in  conjunction with -r, formail will look at the
           mail address of the  envelope  sender  instead  at  the
           Message-ID.

I'd read "not output a duplicate message" as meaning "nothing
will be sent to the output".  Also, does a mail message with
only "From " line qualify as a valid mail message?

The problem with sending something through is that it will
confuse simple scripts like this:

   formail -D 10000 id.cache -s echo . < mbox | wc -l

which attempts to count the number of unique messages, but
will actually count all messages, due to the dummy 'From '
being passed through.


| The workaround is simple, in script.rc, it check for, and discard
| the bogues message:
| 
| #
| # if the message id is duplicated, formail passes in an
| # empty email body, with a 1 line "From foo(_at_)bar'.  This
| # is a bug, I think - the workaround is to check, and
| # just dump the message.  Hoever, this won't work well
| # for mal-formed mail boxes.
| #
| :0 D
| * ^From foo(_at_)bar
| /dev/null

As mentioned in a subsequent e-mail, the simple pattern match
above, didn't appear to work when applied to a message containing
only a '^From foo(_at_)bar' line - don't know why.