in message <JCEPIPKHCJGDMPOHDOIGAEOGDCAA(_dot_)gary(_at_)intrepid(_dot_)com>,
wrote Gary Funck thusly...
:0:
* PUNDIT ?? ^^YES^^
{
T400=`formail -I '' | tr -c '[:alpha:][:digit:]' '_' | tr -s '_' | head -c
400`
:0
* !? echo "Message-ID: $T400" | formail -D 40101 $HOME/.pundit.cache
pundit-mail
:0E
/dev/null
}
If the message has been determined to be from (or refers to) a
"pundit", then PUNDIT=YES. In that event, we take roughly the
first 400 characters of the body of the message and deposit that
into the variable $T400.
You may get false positive if you get a 400B of quoted text.
Once we have a string that is representative of the message, we
prefix it with Message-ID: and feed that into formail -D to see if
we've seen this message prefix before. If this is the first
occurrence, we deposit the message into pundit-mail, otherwise it
is ditched into /dev/null.
But that destroys the original Message-ID:, potentially breaking
threading if you *just* happen to reply to one of the pundits.
Note: we limit the string length to 400 to step around potential
problems with LINEBUF, shell environment variable size limits and
so on. It could likely be set to a somewhat larger value without
problems.
I had to recently remove multiplicate messages based on body as i
stupidly reprocessed the same mbox more than once along w/ option
to add|update a Message-ID: header. I did that in Perl by comparing
the MD5 checksum of message body ...
Program:
http://www103.pair.com/parv/comp/src/perl/undupe-mail-body
Documentation:
http://www103.pair.com/parv/comp/src/perl/pod/undupe-mail-body.pod
So, all i can say, where were you when i indeed your work? (:
Thanks for showing the way.
- Parv
--
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail