dummy(_at_)cyberpass(_dot_)net (Robert) writes:
Here's the worst, so far:
[.. an example with many lines and many similarities ..]
Looking at this example I have a few remarks:
- do you have the possibility to use a "user+arg(_at_)host" kind of thing with
the ability to recover the "+arg" from the mailer rather than from the
headers ? This might already allow you to simplify your filtering. But
maybe not that much..
- maybe extracting the subject first and then using that instead of always
matching it would be faster:
:0
* ^Subject:\/.*$
{ SUBJECT=$MATCH }
...
:0
* blkabla
* SUBJECT ?? (toto|tata|foo)
...
avoids regexp-matching (and hence scanning) the header repeatedly.
Note that you may take advantage of it to improve your "subject extractor"
by dealing with the case where the subject spans several lines.
- I'd try to rewrite your recipes into something more like:
GOODWORDS="(left|writ|hand\>|wrist|tend[io]nitis|blablabla)"
:0w:rsi.lock
* 3^0
* $ -1^1 SUBJECT ?? $GOODWORDS
* $ -1^1 B ?? $GOODWORDS
| procmail_print
it's not doing exactly the same, tho, so you may want to tweak it some more
- you seem to be trying to figure out what to keep. Wouldn't it be easier to
try to see what to trash maybe ? (I'm not saying it is, just that it might be
worth considering)
- since you seem to be matching on pairs or triplets of words, you might want
to write a tiny perl (or sed) script that takes on stdin something like:
left hand\> writ
wrist tend[io]nitis
worker|\<wc\> hearing
\<x\> windows reminder|break|typ
...
and which builds the rules out of this: hugely more maintainable (but no
speed up, admittedly). I do this kind of processing for other cases and it
proves very handy.
Stefan