2006-02-10
If the message has been determined to be from (or refers to) a
"pundit", then PUNDIT=YES.  In that event, we take roughly the
first 400 characters of the body of the message and deposit that
into the variable $T400.

You may get false positive if you get a 400B of quoted text.

Once we have a string that is representative of the message, we
prefix it with Message-ID: and feed that into formail -D to see if
we've seen this message prefix before.  If this is the first
occurrence, we deposit the message into pundit-mail, otherwise it
is ditched into /dev/null.

But that destroys the original Message-ID:, potentially breaking
threading if you *just* happen to reply to one of the pundits.

Note: we limit the string length to 400 to step around potential
problems with LINEBUF, shell environment variable size limits and
so on.  It could likely be set to a somewhat larger value without

I had to recently remove multiplicate messages based on body as i
stupidly reprocessed the same mbox more than once along w/ option
to add|update a Message-ID: header.  I did that in Perl by comparing
the MD5 checksum of message body ...



So, all i can say, where were you when i indeed your work?  (:
Thanks for showing the way.

  - Parv


