Dear All,
A problem and a solution: how to remove frequently appearing junk in
emails, like signitures and standard adverts from you ISP etc. I run
one copy of my emails through a program call mainIt and it is that
copy that I forward to other places where I want to check my mail
quickly.
What mainIt (Latvian for "to change") does is simple it removes
paragraphs that have been seen before and keeps a checksum cache of
these. If you put "-n" on the command line where n is a natural
number it passes all paragraphs small than this size. I often use
mainIt -100 say for email, as this will keep "Dear Kalvis," and "Best
wishes," etc, but it is unlikely for non-junk to be in a paragraph
greater than 100 characters.
The program mainIt:
#!/usr/bin/perl
if ($ARGV[0] =~ /^-/)
{
$min = shift;
}
else
{
$min = $0 unless $min;
}
$min =~ s/.*\D//;
$\ = "";
$/ = "";
if (open(CKSUMS, "$0.cache"))
{
$cksums = <CKSUMS>;
close(CKSUMS);
}
open(CKSUMS, "+>>$0.cache");
while (<>)
{
if (length($_) > $min)
{
$cksum = unpack("%64C*", $_);
unless ($cksums =~ /$cksum\n/)
{
print "$_";
print CKSUMS "$cksum\n";
$cksums .= "$cksum\n";
}
}
else
{
print "$_";
}
}
----------------------------------------------
The procmail script I use to call mainIt (assuming that it is in your
path):
#============
# mainIt zone
#============
:0 fbW
|mainIt -100
----------------------------------------------
Remember as information is being removed from your email it is a good
idea to keep a copy of the message in the full form somewhere. Also I
remove the cache about once per week so from time to time I have a
copy of signitures etc for reference, but I do not want to see them in
every message I read.
Generally I like programs that learn what is what like this one and
the X-Frequent-Sender script for removing spam that I mentioned
before.
Best wishes,
Kalvis
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail