At 08:08 2003-01-12 -0800, Zack Brown did say:
MONTHFOLDER=`date +%Y-%m`
:0:
* ^(Sender:[ ]*owner-|X-BeenThere:[ ]*|Delivered-To:[ ]*mailing
list |X-Loop:[ ]*)\/[-A-Za-z0-9_+]+
$MATCH/$MATCH.$MONTHFOLDER
This works for all but about 8 lists I'm on, which is pretty good. For
those 8 stragglers though, I can't seem to figure out anything robust.
Well, checking Sender: would work for 6 of the 8, but the syntax isn't
consistently owner-listname.
I'm also a bit hesitant, because I don't have deep knowledge of the
culture of email headers, that might indicate that a given header will
behave the same for many lists.
So that you don't have to fret over breaking the recipe which works for the
majority of your lists, you could implement a _second_ recipe, following
this one, which implements the added logic. Only those messages not
handled by the first recipe will be left around to be handled. This of
course assumes that the lists which aren't getting "properly" handled are
in fact NOT MATCHING the first recipe (which say, could be extracting the
wrong named for the lists).
:0:
*
^(List-Post:[ ]*(<mailto:)?|List-Owner:[
]*(<mailto:)?owner-)\/[-A-Z0-9_+]+
$MATCH/$MATCH.$MONTHFOLDER
This works for framers, docbook-apps, docbook, techwr-l.
Note that the character class omits "a-z". Procmail regexps are CASE
INSENSITIVE unless otherwise specified with a flag on the flags line.
Origami, list-managers, and oandp-l require a bit more logic, since the
owner identifier trails the listname. Note that two of these last three
don't actually contain any true "list-type" headers (and lack owner-
designations on the sender addresses), it's a bit more diffiult to peg them
down as lists.
:0:
* ^Sender:.* List <(mailto:)?\/[-A-Z0-9_+]+
$MATCH/$MATCH.$MONTHFOLDER
This picks up the Origami and oandp-l lists, which have "List" text
preceeding the (non-owner) list address in the Sender header, so we can be
reasonably sure that the Sender header is actually identifying a
list. Since this comes _after_ other filters which should hopefully found
owner-listname type identifiers, we should expect that the Sender address
is the address of the list, not the listowner. As you subscribe to more
and more lists, this may need to be revised, though I really think the
listadmins should FIX their lists instead.
That leaves us just with the list-managers list. How Ironic that the one
list that doesn't get matched at this point is for list managers...
For efficiency purposes, it may make sense to handle owner suffixed lists
in a separate recipe, because you need to run the name through sed. Note
that we specifically include the -owner suffix rather than catching it with
the regexp character class regexp -- this is so that we KNOW this match
actually contained -owner , otherwise, we wouldn't differentiate between a
regular (non-list) sender and a list message (though Sender: is really only
present on list messages, AFAIK):
:0E
* ^Sender:[ ]*\/[-A-Z0-9_+]+-owner
{
MATCH=`echo $MATCH | sed -e s/-owner//i`
:0:
$MATCH/$MATCH.$MONTHFOLDER
}
I specify the :0E here in case you ever add a 'c' flag to your list recipe
for some reason. The reassignment of an internal procmail variable (MATCH)
might not seem kosher, but it's valid. If it turns you off, assign it to a
different variable name and be sure to use that in your mailbox spec.
You can get all of these conditions (except the suffixed owner) into a long
one-line condition, or, to better separate the conditions, use maximal
scoring, which makes it a bit easier to extend the conditions without
necessarily breaking your intial regexp:
:0:
* 9876543210^0
^(Sender:[ ]*owner-|X-BeenThere:[ ]*|Delivered-To:[ ]*mailing
list |X-Loop:[ ]*)\/[-A-Z0-9_+]+
* 9876543210^0
^(List-Post:[ ]*(<mailto:)?|List-Owner:[
]*(<mailto:)?owner-)\/[-A-Z0-9_+]+
* 9876543210^0 ^Sender:.* List <(mailto:)?\/[-A-Z0-9_+]+
$MATCH/$MATCH.$MONTHFOLDER
:0E
* ^Sender:[ ]*\/[-A-Z0-9_+]+-owner
{
MATCH=`echo $MATCH | sed -e s/-owner//i`
:0:
$MATCH/$MATCH.$MONTHFOLDER
}
The maximal scoring ensures us that once any one condition matches, the
recipe will proceed directly to the delivery portion, not needing to
evaluate the conditions on subsequent lines.
In the end, you don't have a "single recipe" to do it, but you are
presented with a fairly generic way of identifying the listname.
I've put up archives of the problem mailing lists at
http://tumblerings.org/~zbrown/procmail/
Suggestion - if you really want people to take their time to evaluate your
problem for you, you should take just a handful of messages from each of
those lists, *AND* strip the BODY from each of them, then make them
available as a single file to download. 60KB or so of headers would be one
thing - 4MB of junk is quite another. We're not doing anything with the
bodies, which represent the bulk of the messages, so they're a complete
waste of everybody's bandwidth to download when evaluating your problem.
You doing that work once, from your end, greatly reduces the wasted time
and bandwidth for everyone who might otherwise be willing to assist
you. If we've got a single test mailbox which we can download and pipe
into a filter running in a sandbox (you do know about sandboxes, right? If
not, check my .sig), then we can actually be spending our time helping you
instead of downloading your email.
Return-Path: can sometimes be useful, except that you've got to deal with
things like mailman bounce encoding, and virtually always need to remove an
owner-(listname) or (listname)-owner designation.
Return-path: <list-managers-owner+M1007(_at_)greatcircle(_dot_)com>
ListManagers contained the following header:
Sender: list-managers-owner(_at_)greatcircle(_dot_)com
This is the OPPOSITE of the order you check for owner addresses in your
regexp (though your regexp does it in the fashion normally
employed). Dealing with trailing text that you don't want included within
a match is a PITA.
Some of the messages also contained:
X-MDaemon-Deliver-To: list-managers(_at_)greatcircle(_dot_)com
but that appears to be specific to some of the MTA or MUAs of certain
message authors, not the list itself.
Return-Path has MailMan style bounce encoding.
----
OANDP-L:
Return-Path would be directly useable
You should contact the listowner though and have them fix the Sender header
-- this is inviting bounces from braindead MTAs (and believe me, there are
many) TO THE LIST. The sender should be an owner- alias on the server, so
as to not direct certain types of messages to the list, but rather to a
person (even if they probably ignore the owner messages).
Sender: Orthotics and Prosthetics List <OANDP-L(_at_)LISTS(_dot_)UFL(_dot_)EDU>
----
Origami:
Return-Path would be useable.
The above comment about the Sender: header applies here as well.
Sender: Origami Mailing List <Origami(_at_)MIT(_dot_)Edu>
----
TECHWR-L:
Return-Path is encoded. (bounce-)techwr-l(-number)@lists.raycomm.com
The following list-specific headers appear:
List-Unsubscribe:
<mailto:leave-techwr-l-71444C(_at_)lists(_dot_)raycomm(_dot_)com>
List-Subscribe: <mailto:subscribe-techwr-l(_at_)lists(_dot_)raycomm(_dot_)com>
List-Owner: <mailto:owner-techwr-l(_at_)lists(_dot_)raycomm(_dot_)com>
Sender: bounce-techwr-l-71444(_at_)lists(_dot_)raycomm(_dot_)com
The supplementary filter matches this list via List-Owner:
----
docbook-apps (same applies to docbook, though it should be noted that
several of the messages in your docbook archive are actually docbook-apps
messages, and are clearly identified as such):
Return-path isn't a simple "owner" type, but is basically the same,
replacing "errors" for "owner": <docbook-apps-errors(_at_)lists(_dot_)oasis-open(_dot_)org>
Lots of list-specific headers, though these boneheads don't provide a
Sender: header. Tsk. You might have a word with the listadmin and ask why
they fail to include this significant header.
List-Owner: <mailto:docbook-apps-help(_at_)lists(_dot_)oasis-open(_dot_)org>
List-Post: <mailto:docbook-apps(_at_)lists(_dot_)oasis-open(_dot_)org>
List-Subscribe: <http://lists.oasis-open.org/ob/adm.pl>,
<mailto:docbook-apps-request(_at_)lists(_dot_)oasis-open(_dot_)org?body=subscribe>
List-Unsubscribe: <http://lists.oasis-open.org/ob/adm.pl>,
<mailto:docbook-apps-request(_at_)lists(_dot_)oasis-open(_dot_)org?body=unsubscribe>
List-Archive: <http://lists.oasis-open.org/archives/docbook-apps/>
List-Help: <http://lists.oasis-open.org/elists/admin.shtml>,
<mailto:docbook-apps-request(_at_)lists(_dot_)oasis-open(_dot_)org?body=help>
List-Id: <docbook-apps.lists.oasis-open.org>
These are matched with the List-Post header against the supplementary
recipe I provide above (List-Owner isn't a good header to use here because
they don't use an owner- style header and the added text is trailing the
listname, which is harder to contend with using $MATCH, not to mention
"-help" isn't uncommon for part of a listname).
----
framers:
Return-path: <bounce-framers-71493(_at_)lists(_dot_)FrameUsers(_dot_)com>
Plus the following:
List-Unsubscribe:
<mailto:leave-framers-71493R(_at_)lists(_dot_)FrameUsers(_dot_)com>
List-Subscribe: <mailto:subscribe-framers(_at_)lists(_dot_)FrameUsers(_dot_)com>
List-Owner: <mailto:owner-framers(_at_)lists(_dot_)FrameUsers(_dot_)com>
Sender: bounce-framers-71493(_at_)lists(_dot_)FrameUsers(_dot_)com
This is matched with the List-Owner header against the supplementary recipe
I provide above.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail