procmail
[Top] [All Lists]

Re: subject-based killfile

1996-12-08 19:57:56
    > I would like to setup a _sorting_ (not bouncing) recipe for certain  
    > Subjects.
    > 
    > Well, actually, it would be based on _parts_ of subjects.  I would  
    > like to be able to do this based on keywords found in the subject,  
    > such as:
    > 
    > subscribe
    > unsubscribe
    > virus
    > try this
    > credit card
    > money
    > 1-900
    > cmsg cancel
    > !!!
    > ***
    > +++
    > 
    > I don't want to bounce these, just put them in a folder so I can  
    > make sure they are messages to delete.
    > 
    > I was going to setup a file to do this, but since I'm not matching  
    > full subjects, but only parts, I wondered if I had to do this within  
    > .procmailrc with something like this:
    > 
    > :0:
    > * ^Subject:.(*ubscribe|\
    >           *virus|\
    >           *try this|\
    >           *cmsg cancel|\
    >           *credit cards|\
    >           *money|\
    >           *1-900)
    > Trash

You can do this, but you are using the '*' incorrectly.  I would write
the recipe like this:

    :0:
    * ^Subject:.*(subscribe|\
                  virus|\
                  try this|\
                  cmsg cancel|\
                  credit cards|\
                  money|\
                  1-900)
    Trash

Recall that '*' doesn't match anything itself, it just repeats the previous
pattern zero or more times.

Please read the procmailrc man page (3.11pre4), especially the section
titled: "Extended regular expressions".


If you are worried about the total length of the line, just make LINEBUF larger
before you build a large regexp line:

    LINEBUF=2048                # make a larger buffer for lines

    > of course this could get very long, which is why I wanted to do it  
    > in a file, but can I do that without having to match the entire  
    > subject??

If you would like to use a lookup file, here's how I might do it: (I
like using files for keyword lookups, because when you add or change
keywords, you end editing the data for the search, not the search
recipe itself, so mistakes--which affect only the lookup, and not your
entire recipe file--are not as costly).

First, before preparing a keyword file, you must decide on whether or
not this file is a list of search *strings* or search *patterns*
(regexps).  The former are easier, while the latter are more flexible
and general.  I'll assume that you'll just be using search strings.

Now prepare a keyword string file; I'll call it "trash-subjects".  

    % cat <<-EOF >trash-subjects
        subscribe
        virus
        try this
        cmsg cancel
        credit cards
        money
        1-900
    EOF

Here's the recipe which can use it:

    # If the subject of the current mail matches "trash-subjects",
    # then file the mail into "Trash".
    :0:
    * ? formail -zxSubject: | fgrep -s -f trash-subjects 
    Trash

You should be aware that this filter is not very discriminating.  If
your best friend sends you an email with a subject on a reall cool new
1-900 phone number, it will get trashed right along with that of a
stranger.

To make the filter a little more careful is easy, if you know how to
decide what's "trash" mail and what is not.  One way is to say "any mail
from a friend is ok".  So, now you need a file in which you keep the names
and/or addresses of your friends; this file should be kept as regular
expressions to make it flexible in handling addresses from a variety
of addresses within a domain, and to be flexible with the name matching.

Here's how I might do it:

    % cat <<-EOF >friend-regexps
        (Alan (K\. )?Stebbens|aks(_at_)(_dot_)*sgi\(_dot_)com)
        (Timothy.*Luoma|luomat(_at_)nerc\(_dot_)com)
        (David (W\.)?Tamkin|dattier(_at_)(_dot_)*wwa\(_dot_)com)
        (Soren Dayton|csdayton(_at_)cs\(_dot_)uchicago\(_dot_)edu)
    EOF

Then, the recipe to use both "friend-regexps" and "trash-subjects"

    # If the mail is NOT from a friend
    # and has a trashy subject, then file it into "Trash".
    :0:
    * !? formail -rtzxTo:   | egrep -s -f friend-regexps
    * ? formail -zxSubject: | fgrep -s -f trash-subjects
    Trash

Of course, you must keep your files up-to-date, but this is easy to do
with separate files, and if you make a mistake in them, you won't be
causing procmail to get confused on a broken recipe.

One mistake you don't want to make in the "trash-subjects" file is to 
place such a common word or phrase that "good" mail gets trashed.

One more addition might be useful: a file of addresses of known spammers; I'll
call it "spammers" and it is simply a file of "bad" addresses.

Here is a recipe to use it, and you can place this either before or after the
other recipe. 

    # If the mail is from a spammer, then trash it
    :0:
    * ? formail -rtzxTo: | fgrep -s -f spammers
    Trash

If you care to optimize these recipes, there are ways you can eliminate
the "formail" calls, using procmail recipes to extract those fields,
and feed the equivalent strings to "egrep" or "fgrep" using an "echo".
This exercise is left to the reader.

G'luck.

___________________________________________________________
Alan Stebbens <aks(_at_)sgi(_dot_)com>      http://reality.sgi.com/aks

<Prev in Thread] Current Thread [Next in Thread>