Hello, I'm trying to run a script that takes a mail box and
filters out duplicate messages that appear in the original.
formail -D 100000 id.cache -s procmail -m script.rc < mbox > filtered_mbox
where id.cache is used to search for duplicate message ids. and script.prc
is a procmail script that does some further filtering on the output.
What I see happening (v3.14, also 3.10) is that formail does detect
duplicate messages. However, when it finds a duplicate, its ends
up tradting it as an empty message which it prepends a
'From foo(_at_)bar' to and then passes this message onto procamil,
and the script.rc script. I think it should just ignore the
duplicate e-mail and move on. Is this a bug?
The workaround is simple, in script.rc, it check for, and discard
the bogues message:
#
# if the message id is duplicated, formail passes in an
# empty email body, with a 1 line "From foo(_at_)bar'. This
# is a bug, I think - the workaround is to check, and
# just dump the message. Hoever, this won't work well
# for mal-formed mail boxes.
#
:0 D
* ^From foo(_at_)bar
/dev/null
but this way of handlig things runs into problems, if for example,
formail adds the dummy 'From foo(_at_)bar' line is added by formail
for other reasons.
PS: are there any guidelines on sizing the message id cache?