At 12:55 2009-01-25 +0100, Xavier Maillard wrote:
In fact, I thought I would take the Return-Path but after having
analyzed different "target" messages, it won't work. So I need to
find the most useful header for that.
For generic identification of lists, there are _several_ headers which
should be examined. The listname_id recipes go through a series of headers
looking for an appropriate match, then parse it down to a token which
should just be the listname, without needing to work from an array of known
lists.
The idea behing my request is to write a rule for domains I am
both subscribed to their numerous mailing-list and where I am
also a moderator.
I use my generic list identification routines, but really only to set a
variable that identifies the listname. Elsewhere in my procmail setup, I
have recipes that check for a specific listname and then take other actions
(say for lists where I'm a moderator and have to wade through WAY too much
bogus stuff submitted to the list - who has the time? So I identify
listadmin messages and scan the bodies for tokens that would indicate that
it isn't a foreign submission but rather an errant reply from an alternate
email account of a user, which is pretty common, then flag those to be
displayed in my client).
Currently, this is the closer rule I have found:
[snip]
You have way too much stuff dedicated to identifying the one list (or
series of lists on one host).
INCLUDERC=listname_id.rc
:0
* LISTNAME ?? ^^gnu-tools^^
{
# do something specific for this list, or just file away
}
If I weren't so innundated with other stuff right now, I'd consider
extending the listname_id.rc recipes to include a section for identifying
probable listadmin/moderator messages. I've only had to deal with mailman
and majordomo myself though. The plethora of webforums out there would
probably complicate this.
With mailman for instance, the listname_id stuff already identifies the
moderator messages as belonging to the related list - all one needs to do
is get a match on Sender:.*mailman-bounces@ or
X-List-Administrivia:[ ]*yes, and you have a reasonable expectation
is it a list administration message, so you _set_ another variable
indicating it is a LIST_ADMINISTRATIVE message or whatever. You do this
generically one time for all messages, and then check it when you need to.
For instance, some lists I'm on are set up to circumvent my spam filters or
to have an elevated allowance (say, because there's a lot of spammy type
stuff discussed on them). Having that LISTNAME variable at the ready makes
this easy.
I want this rule to apply for gnu.org, lolica.org and several
other domains. TLD, DOMAIN and LIST would then be used to sort
mails in a TLD/DOMAIN/LIST hierarchy.
Honestly, that seems more trouble than it is worth - a token-by-token
heirarchy makes sense if you have gobs of items to deal with (and if the
tokens help categorize and find stuff).
> # first, match the domain down to JUST the rightmost two domain tokens
> # (i.e. remove the optional hostname levels). As parsed here, I'm allowing
> # for the FROMDOMAIN to actually be an email address - this will still
work.
Pretty impressive !
Not really, it just makes sense to examine a regexp and see how you can
refine it so that it can happily digest a variety of potential inputs and
still give the desired result.
> BTW, you do realize that outside of the country-generic TLDs such as
> .com, .org, .net, .biz, etc, that some country specific TLDs often
> have their own secondary heirarchy. For example:
>
> host.demon.co.uk
Ooops, I did not think about this case :/
Some simple changes to my previously posted recipe would handle the
two-level TLD (so long as a domain.2-letter.country) alongside a regular tld.
Note that DOMAIN and TLD orders are swapped (previously, it didn't matter
what their order was, but in the revised approach, we use the domain to
anchor the leading text before the match):
# first, match the domain down to JUST the rightmost two tokens
:0
* FROMDOMAIN ??
[(_at_)(_dot_)]?\/[^(_at_)(_dot_)]+\(_dot_)([^.]+|[^.][^.]\.[^.][^.])$
{
TOPDOMAIN=$MATCH
# next, get the domain portion - this is everything up to,
# but not including the first dot.
:0
* MATCH ?? ^\/[^.]+
{
DOMAIN=$MATCH
}
# we need to fall back to the saved TOPDOMAIN and get the
# TLD portion - this is everything AFTER the domain and a dot.
# this implementation allows for two-part TLDs (co.uk for example)
# because the RHS of this condition includes a variable which
# needs to be expanded, we use the $ flag on the condition.
:0
* $ TOPDOMAIN ?? ^$DOMAIN\.\/.*$
{
TLD=$MATCH
}
}
I would really be thankful if somebody would explain the
listname_id.rc line by line :)
My suggestion to you would be to make a test harness - I call it a
"sandbox" (sandboxes are intended to keep the sand in, and kids play in
them). Then, get the listname_id.rc file and includerc it into your
sandbox. DEFAULT can be to /dev/null, and you set verbose logging (which
is sort of the object of a sandbox). Take a pile of old email messages
then use formail to split the mailbox (assuming it's MBX):
formail -s procmail -m sandbox.rc < saved_mail.mbx
Note that it is really important that the mailbox you're feeding into this
is NOT a delivery target of the rules invoked by the sandbox.
I have a standard sandbox and listname_id posted on my procmail pages
(which are a bit long in the tooth for visual appearance, but the scripts
are still valid). The rules I posted previously would have simply been put
into their own file, such as filter.rc, and included into the sandbox
(thus, the sandbox framework remains constant). This is what I use to do
quick tests of things I go to post here (and of course, for my own stuff as
well).
As fun as it would be to re-explain the logic behind each line of an
rcfile, ultimatley, if it has proven to do it's job, and has been subjected
to peer review, if it provides you with a string you can use to identify
one list from another, is there really a need to comprehend each line of
it? The procmail list archives will contain several threads discussing the
development and use of the ruleset.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail