No matter which way you turn, fault-tolerant duplicate detection is a
bit outside the scope of what Procmail was actually made for.
One thing I wish I knew how to do is detect messages where the _bodies_
have duplicate content, but came through list servers that changed the
message ID and perhaps tack on a trailer.
I haven't really looked at the formail code, but it seems to save
these Message-Id:s in a fairly compact format (i.e. very little
overhead, it's basically little more than the Message-Id:s themselves.
In fact, you can just look at the msgid.cahe file and see that it contains
only message IDs in plain text without line breaks. (If it had line breaks,
'wc -l msgid.cache' would tell you how many messages were there.)