procmail
[Top] [All Lists]

Question about mailing lists

2007-08-14 15:01:58
First - apologies for the long lines.

I haven't used procmail in around five years, and, even then, I only
used it for three or four weeks before moving on to maildrop.  Having
returned to procmail for my mail processing needs, the first thing I
needed was to make sure my mailing lists were processed correctly.

So, procmail -v (compiled by me to change ORGMAIL[1]):
#v+
procmail v3.22 2001/09/10
    Copyright (c) 1990-2001, Stephen R. van den Berg    <srb(_at_)cuci(_dot_)nl>
    Copyright (c) 1997-2001, Philip A. Guenther         
<guenther(_at_)sendmail(_dot_)com>

Submit questions/answers to the procmail-related mailinglist by sending to:
        <procmail-users(_at_)procmail(_dot_)org>

And of course, subscription and information requests for this list to:
        <procmail-users-request(_at_)procmail(_dot_)org>

Locking strategies:     dotlocking, fcntl(), lockf(), flock()
Default rcfile:         $HOME/.procmailrc
Your system mailbox:    /home/dave/Maildir/
#v-

After a bit of searching, I came across the following recipe[2] to
extract mailing list headers:
#v+
LISTNAME
:0
* 9876543210^0 ^(List-Post:[    ]*(<mailto:)?|List-Owner:[      
]*(<mailto:)?owner-)\/[-A-Z0-9_+]+
* 9876543210^0 ^(List-Id:.*<|X-Mailing-List:[   ]*)\/[-A-z0-9_+]+
* 9876543210^0 ^(Sender:[       ]*owner-|X-BeenThere:[  ]*|Delivered-To:[       
]*mailing list )\/[-A-Za-z0-9_+]+
* 9876543210^0 ^Sender:.* List"? <(mailto:)?\/[-A-Z0-9_+]+
{ LISTNAME=$MATCH }

# OK, taht didn't work, let's try List-Subscribe
:0E
* ^List-Subscribe:.*<mailto:\/[-A-Z0-9_+]+-(digest|on|subscribe)@
* MATCH ?? ^^\/.+-
* MATCH ?? ^^\/.+[^-]
{ LISTNAME = $MATCH }
#v-

uname -a:
#v+
Linux tigger 2.6.18-ovz028stab031.1-enterprise #1 SMP Thu Apr 26 21:10:46 MSD 
2007 i686 GNU/Linux
#v-

Before I get into my questions, I do have one about the second recipe.
Why the following conditions in the List-Subscribe recipe?

    Do the contents of MATCH match any character one or more times with
    one dash at the end? [  * MATCH ?? ^^\/.+-     ]
        AND
    Do the contents of MATCH match any character one or more times with
    no dash at the end?  [  * MATCH ?? ^^\/.+[^-]  ]

I would think they would cancel out, and make that recipe useless...
what am I missing?

As stated before the second recipe, "taht didn't work."  I ended up
putting together the following test headers to make sure my recipe set
picked up all variations on the rules:

#v+
Delivered-To: mailing list test1(_at_)example(_dot_)com
List-owner: <mailto:owner-test2a(_at_)example(_dot_)com>
List-owner: owner-test2b(_at_)example(_dot_)com
List-post: <mailto:test3a(_at_)example(_dot_)com>
List-post: test3b(_at_)example(_dot_)com
Sender: "froob forum" <mailto:test4a(_at_)example(_dot_)com>
Sender: "froob forum <mailto:test4b(_at_)example(_dot_)com>
Sender: "froob mailing list" <mailto:test4c(_at_)example(_dot_)com>
Sender: "froob mailing list <mailto:test4d(_at_)example(_dot_)com>
Sender: "froob forum" <test5a(_at_)example(_dot_)com>
Sender: "froob forum <test5b(_at_)example(_dot_)com>
Sender: "froob mailing list" <test5c(_at_)example(_dot_)com>
Sender: "froob mailing list <test5d(_at_)example(_dot_)com>
Sender: test6a-owner(_at_)example(_dot_)com
Sender: test6b-bounce(_at_)example(_dot_)com
Sender: owner-test7(_at_)example(_dot_)com
X-Beenthere: test8a(_at_)example(_dot_)com
X-Loop: test8b(_at_)example(_dot_)com
#v-

After a bit of testing, I determined that the following rules extracted
the list name of all my mailing lists (including an oddball with only
the sender field):

#v+
WS='    '
LCHARS='-A-Z0-9_+'

:0
* $ ^(\
delivered-to:[$WS]*mailing list |\
list-owner:[$WS]*(<mailto:)?owner-|\
list-post:[$WS]*(<mailto:)?|\
sender:[$WS]*.*(forum|list)\"? <mailto:|\
sender:[$WS]*owner-|\
x-(beenthere|loop):[$WS]*\
)\/[$LCHARS]+
{ LISTNAME=$MATCH }

:0E
* $ ^sender:[$WS]
{
    :0  # 5a-d
    * $ ^sender:[$WS]*.*(forum|list)\"? <\/[$LCHARS]+
    { LISTNAME=$MATCH }

    :0E # 6a-b
    * $ ^sender:[$WS]*\/[$LCHARS]+-(bounce|owner)
    { LISTNAME=`echo $MATCH | sed -e 's/-\(owner\|bounce\)$//i'` }
}
#v-

So, on to my questions:

If I combine the recipe
sender:[$WS]*.*(forum|list)\"? <mailto:
and
sender:[$WS]*.*(forum|list)\"? <
into
sender:[$WS]*.*(forum|list)\"? <(mailto:)?

Then I end up with LISTNAME=mailto for tests 4a-4d above.  I believe
that is because of procmail's anti-greedy left-side matching.  Is that
correct?  If so, is there any way to merge those two and still allow it
to successfully extract the mailing list name from tests 4a-4d and 5a-5d
above?

Last, please let me know if you see any problems with my implementation,
as I'm thinking about trying this out with all my mailing lists, and
wanted to get other eyes on my solution first.

[1]: http://www.ii.com/internet/robots/procmail/qs/#DEFAULT
[2]: http://marc.info/?l=procmail&m=118158714819169&w=2

Regards,
-- 
dave [ please don't CC me ]
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>