procmail
[Top] [All Lists]

Re: generic matching for mailing lists...

2003-01-12 12:42:07
Zack Brown <zbrown(_at_)tumblerings(_dot_)org> wrote:

Thanks to folks on this list and elsewhere, I'm using what I think
is a fairly typical generic match for mailing lists, but there are
still a few lists it won't cover, and I don't see a good way to handle
them. The recipe I use is (note that [ ] encloses a space and tab):


MONTHFOLDER=`date +%Y-%m`
:0:
* ^(Sender:[  ]*owner-|X-BeenThere:[  ]*|Delivered-To:[       ]*mailing list 
|X-Loop:[        ]*)\/[-A-Za-z0-9_+]+
$MATCH/$MATCH.$MONTHFOLDER

This works for all but about 8 lists I'm on, which is pretty good. For
those 8 stragglers though, I can't seem to figure out anything robust.
I'm also a bit hesitant, because I don't have deep knowledge of the
culture of email headers, that might indicate that a given header will
behave the same for many lists.

One possible Achilles' heel is the order you'd prefer to check the
headers under.  You could get a mismatch that would be hard to trace.

I store my lists via a similar algorithm, except that I order the
headers by (some value of subjective "certainty" of) their likelihood
of actually being a list.  I do that in a preceding recipe that
operates by the reverse- or inverse-DeMorgan principle that our own
David Tamkin long ago conceived.  The logic of the reverse-DeMorgan
is not immediately obvious to most.  But reading up on it in the
list archives will help, for those who wish for the pointers.  The
main premise is that the value of MATCH is saved even on negative
statements; and the last good value of MATCH can be called in the
subsequent recipe.  Since I ordered the possible list headers as
I wish for them to appear, an earlier match means, for me, a higher
likelihood that it's one of "my" lists and not something weird slipping
through.

Anyway, here's how I handle some lists:

 :0  # 021214 () trusted lists; target headers in order of match preference
  * $  ! ^X-BeenThere:.*\/[^$WS].*
  * $  ! ^Sender:.*\/[^$WS].*
  * $  ! ^X-Sender:.*\/[^$WS].*
  { }
 
 :0 E  # 030106 () (Reverse-De Morgan logic) match against candidate lists
  * $  MATCH  ??  ^^[^(_at_)]*\/$LISTS
  * $  MATCH  ??  ^^\/[^$SPACE]+
  {
      # (here went the selfsame tired "list-noise recipe that I've poste
      #  too many times already recently)


      :0:  # 030106 () save our list locally
       $MATCH
  }


The trick to get rid of a text past a medial space might be helpful to you.
What's in $LISTS?  Currently, it's this:

[~/.procmail/vars] 211[0]> grep LISTS mydata
  LISTS        =         bernief9|edupage|procmail|Spam(tools)?(\ Prevention)?
  LISTS        = ($LISTS|volition-tech|xdesk)


You might appreciate noticing that my algorithm can handle both the Spamtools
list and the Spam Prevention list.  It will save the first to "spamtools" and
the second (once my medial-space chopper is done) to just "spam".

Hope that helps.

-- 
dman


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>