procmail
[Top] [All Lists]

A script that learns what junk is and removes it from following emails --- mainIt

2001-08-01 03:45:02
Dear All,

A problem and a solution: how to remove frequently appearing junk in
emails, like signitures and standard adverts from you ISP etc.  I run
one copy of my emails through a program call mainIt and it is that
copy that I forward to other places where I want to check my mail
quickly.

What mainIt (Latvian for "to change") does is simple it removes
paragraphs that have been seen before and keeps a checksum cache of
these.  If you put "-n" on the command line where n is a natural
number it passes all paragraphs small than this size.  I often use
mainIt -100 say for email, as this will keep "Dear Kalvis," and "Best
wishes," etc, but it is unlikely for non-junk to be in a paragraph
greater than 100 characters.

The program mainIt:

#!/usr/bin/perl

if ($ARGV[0] =~ /^-/)
{
    $min = shift;
}
else
{
    $min = $0 unless $min;
}

$min =~ s/.*\D//;

$\ = "";
$/ = "";

if (open(CKSUMS, "$0.cache"))
{
    $cksums = <CKSUMS>;
    close(CKSUMS);
}

open(CKSUMS, "+>>$0.cache");

while (<>)
{
    if (length($_) > $min)
    {
        $cksum = unpack("%64C*", $_);
        unless ($cksums =~ /$cksum\n/)
        {
            print "$_";
            print CKSUMS "$cksum\n";
            $cksums .= "$cksum\n";
        }
    }
    else
    {
        print "$_";
    }
}

----------------------------------------------

The procmail script I use to call mainIt (assuming that it is in your
path):


 #============
 # mainIt zone
 #============
 :0 fbW
        |mainIt -100

----------------------------------------------

Remember as information is being removed from your email it is a good
idea to keep a copy of the message in the full form somewhere.  Also I
remove the cache about once per week so from time to time I have a
copy of signitures etc for reference, but I do not want to see them in
every message I read.

Generally I like programs that learn what is what like this one and
the X-Frequent-Sender script for removing spam that I mentioned
before.

Best wishes,

Kalvis


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • A script that learns what junk is and removes it from following emails --- mainIt, Tris <=