procmail
[Top] [All Lists]

Re: Filtering Email for Mailing Lists

1999-01-12 06:40:29
|Mon 1999-01-11 "Joey Smith" <joey(_at_)samaritan(_dot_)com> list.procmail
|
| grab my Digest mailing lists, pull out the individual messages,
| and sort them into folders according to which list they came from.

There is ready procmail module for detecting your mailing lists. check it out
from pm-code.shar (See file server mentioned in X-info). Docs below.

jari

----------------------------------------------------------------------
Pm-jalist.rc -- Subroutine to detect mailing LIST from message.

    File id
 
        .Copyright (C)  1998 Jari aalto 
<jari(_dot_)aalto(_at_)poboxes(_dot_)com>
        .Contactid:     <jari(_dot_)aalto(_at_)poboxes(_dot_)com>
        .Created:       1998-06
        .Keywords:      procmail subroutine list detect
 
        This code is free software in terms of GNU Gen. pub. Lic. v2 or later
        You can get newest version by sending email to maintainer with
        subject "send <FILENAME>"
 
    Description
 
        This subroutine tries to detect and derive the mailing list name as
        it appears in some of the know methods that ezlm, smarlist,
        listserv, majordomo etc normally use. After this subroutine has
        been applied to message the variable `LIST' contains the mailing
        list name. Subroutine adaptively finds new new mailing lists
        from the messages.
 
    Quick start
 
        If you just want to jump in and use this module and you
        see that some list isn't trapped, please set
 
        o   JA_LIST_HEADER_REGEXP to match the From: field site regexp.
 
        If you want to make some list more unique, like if name "Alert"
        was detected as list name, please set
 
        o   JA_LIST_MAKE_UNIQUE to match the list name, like "Alert"
            and the lista name will be converted to HOST-LIST format.
 
    Sendmail plus method for list subscription
 
        If you can use sendmail PLUS addressing capabillities, you may not
        be interested in this module, because you have an alternative way
        to handle mailing list messages. Let's suppose you want to
        subscribe to procmail maling list and want to save all messages
        to folser list.procmail, then you'd subribe with address:
 
            login+list(_dot_)procmail(_at_)site(_dot_)com
 
        The extra information after "+" is available to your procmail
        scripts via $ARG pseudo variable when procmail is the LDA. If you
        fortunate to have new sendmail, you usually subscribe to mailing
        lists with regular email address:
 
            login(_at_)site(_dot_)com
 
        How do you detect the arriving mailing list messages?
        Traditionally, you would add a piece of recipe to .procmailrc to
        catch each list, but that's manual work every time. When you use
        this subroutine, you no longer need to write separate mailing list
        recipes to your .procmailrc every time you subscribe to a new
        mailing list. The detection of a new list happens in this
        subroutine for you.
 
    What you need to know before using this module
 
        There is lot of heuristics going on in this modules and one thing
        that you must do, if you're a member of tech support or if you
        get cron messages from your server. The rule is:
 
            If TO domain is same as FROM/SENDER/REPLY-TO domain
            then it is considered a mailing list message.
 
        This causes certain messages landing to category LIST automatically.
        This module can't possibly know that the following is not from
        mailing list, because it doesn't know "what is mailing list", only
        "how it probably looks like it". This is definitedly categorized as
        maling list message, because `From' and even `Reply-to' has same
        domain `foo.bar.net' as in `To'.
 
            To: support(_at_)foo(_dot_)bar(_dot_)net
            From: messagepad(_at_)foo(_dot_)bar(_dot_)net
            Reply-to: support(_at_)foo(_dot_)bar(_dot_)net
            Subject: Vmail See message to Eric
 
        You must prevent checking messages like this by surrounding the
        RC with if statement:
 
            #   Do not check these messages
 
            noList = "From.*(foo.bar.net|support.my.com)"
 
            :0
            *$ ! $noList
            {
                INCLLUDERC = $RC_LIST
            }
 
    Ask for help
 
        If you find maling lists that this subroutine does not detect, but
        which could have been detected by looking the headers in standard
        way, please send a email to  maintainer. There may be cases where it
        is impossible to detect the mailing list and in those cases you
        just has to carve a new entry to your procmailrc.
 
        When you keep your procmail log running, you may see message
 
             *** potential list ***
 
        Which is an indication that some new recipe could be added to
        to this subroutine to detect that mailing list. If the message
        you received WAS from a mailing list, please send all the headers
        to the maintainer so that support can be added.
 
        You can search for mailing list that interests you at:
 
            http://www.lsoft.com/lists/listref.html
            http://www.netmeg.net/faq/internet/mail/mailing-lists/
 
    Code notes
 
        Bill Houle sent me interesting headers which caused me to add
        more heuristical approach that I would have originally wanted.
        From these headers there really is impossible to derive the
        original list name. So, I tossed my own and derived the name
        by combining Reply-To's LOGIN with Errors-To fields first server
        name
 
            Reply-To: news(_at_)doodle(_dot_)foo(_dot_)net
            Errors-To: bounced(_at_)doodle(_dot_)foo(_dot_)net
 
        The list name formed was "news-doodle". So, If you happen to see
        an odd name like this which doesn't remind your original list
        name, it may be due to poor headers that have no clue about
        the real name. No problem, check below how you would convert
        this name to better mailbox name.
 
    Required settings
 
        PMSRC must point to source direcry of procmail code. This subroutine
        will include pm-javar.rc from there.
 
        o   pm-javar.rc is needed and must reside along $PMSRC
 
    Variable JA_LIST_KILL_POSTFIX
 
        If grabbed `LIST' match this regexp at the end of list name, then
        the postfix match will be removed. It is traditional that many
        lists name themself as list1-info, list2-beta, list3-l and you
        would prefer more names (for mbox) list1, list2 and list3. The
        default value will ditch "-(info|beta|l)".
 
    Variable JA_LIST_KILL_PREFIX
 
        Just like the postfix variable. If this string is matched at the
        beginning of the LIST, it is removed.
 
    Variable JA_LIST_HEADER_REGEXP
 
        This is *optional* variable, which you can set to match regexp of
        the mailing list domain address if it slipped through the tests
        in this module. There are some lists that send messages that don't
        carry enough information in headers to determine their list status.
        If you narrow the group by setting JA_LIST_HEADER_REGEXP, then for
        example lists like these, that identify themselves only through
        two headers, can be found:
 
            Reply-To: dispatch-faq(_at_)cnet(_dot_)com
            From: CNET Digital Dispatch <dispatch(_at_)cnet(_dot_)com>
 
        For that list you would set
 
            JA_LIST_HEADER_REGEXP = "(@cnet\.com)"
 
        Don't worry. all the other list detection recipes has already
        been tried, so this is last test that are carried out and variable
        JA_LIST_HEADER_REGEXP helps eliminating possible mishist
 
         You don't need set this variable to include all mailing list
         domains. Only to those ones that were not trapped. The default
         value for this is:
 
            "(amazon\.com|bookpool\.com)"
 
    Variable JA_LIST_MAKE_UNIQUE
 
        If you're subscribed to many mailing lists, that simply tell that
        they are *news* or *newsletter*, it will be impossible to
        differiantiate A *news* from B *news*. This varaible holds regular
        expression that, if matched, prepend the first hostname to the
        beginning of listname, thus making the list unique:
 
            news(_at_)some(_dot_)com       --> some-news
            news(_at_)here(_dot_)com       --> here-news
 
        The default value matches lists that are contain word *news*, but you
        may need to set this to more matches.
 
    Variable JA_LIST_CONVERSION
 
        Many times the grabbed `LIST' name is not what you would like to
        use for your mailbox name. You want to make the name perhaps
        more shorter, more descriptive or categorize the messages according
        to hierarchy. Let's say that you have subscribed to following mailing
        lists:
 
            LIST            LIST name    Description of mailing list
            (as grabbed)    you want
 
            jde             java.jde    Java Development Env
            java            java.prog   Java programming
            FLAMENCO        flamenco    Flamenco music
            tango-l         tango       Argentine Tango dancing
            tm-en-help      tm-en       Emacs TM mime package mailing list
            w3-beta         w3          Emacs WWW mailing list
 
        First, remember that the variable `JA_LIST_KILL_POSTFIX' is applied,
        so the actual `LIST' appear as follows:
 
            jde, java, FLAMENCO, tango, tm-en, w3
 
        Ok, Now we apply the conversion table by defining it as follows,
        where the grabbed LIST is first, then comes space(s), new name
        _and_ terminating colon. Repeat this for each list you want to
        convert.
 
            LIST CONVERSION,LIST CONVERSION,
 
        This gives us table below: Notice that antries tango-l, w3-beta
        were not included, because the `JA_LIST_KILL_POSTFIX' already got
        rid of the posfixes. Also note how the uppercase match FLAMENCO is
        converted to more suitable lowercase mailbox name. After you have
        set up this variable you can start saving messages to folders.
 
            JA_LIST_CONVERSION = "\
            jde       java.jde,\
            java      java.prog,\
            FLAMENCO  flamenco,\
            "
 
        The list conversion is done with pure procmail means, so it is very
        fast. It also means that the conversion is limited to FROM-STRING
        TO-STRING syntax. No wildcards or regular expressions are allowed.
 
          If you consider using an external process, like `sed' or `perl'
          to convert the grabbed list name to something else (when
          `JA_LIST_CONVERSION' method was not enough); think again. For each
          incoming mailing list message you launch external process. I get
          700 messages from various mailing lists a day so you can imagine how
          much load any external process would cause. Just use the grabbed
          mailing list name and `JA_LIST_CONVERSION' table if you care
          about system load.
 
        If you have many mailing lists that use uppercase names, it may be
        tedious to add each mailing list name to `JA_LIST_CONVERSION'.
        Possible alternative is to add conversion recipe: `tr' is most
        efficient here to convert name to lowercase. Again; think twice,
        extra process could be avoided if you use `JA_LIST_CONVERSION'.
 
            :0
            * ! LIST ?? ^^^^
            {
                :0 D            # still uppercase list name?
                * LIST ?? [A-Z]
                {
                    LIST = `echo $LIST | tr A-Z a-z`
                }
 
                :0 :
                list.$LIST
            }
 
    Example: basic installation
 
        Here is recipe to save all your mailing list to separate folders.
        If you subsribe to new lists or unsubsribe to lists, you don't
        need to change anything.
 
            RC_LIST = $PMSRC/pm-jalist.rc   # name the subroutine
 
            ...
 
            #   Handle all mailing lists with one subroutine and recipe
            #   following it
 
            INCLUDERC = $RC_LIST
 
            :0                          # if list name was grabbed
            * LIST ?? [a-z]
            {
                dummy = "Saving mailing list: $LIST"
 
                :0 :
                list.$LIST
            }
 

<Prev in Thread] Current Thread [Next in Thread>