procmail
[Top] [All Lists]

Re: procmail slow on big files?

2001-04-27 01:44:03
Looking for efficiencies, Jay Collins writes:

 I posted my procmailrc (well, my boss wrote it... ugh!) at
http://users.exis.net/~jcollin/procmailrc . Any tips anyone has would be
nice. It's pretty huge, so I won't drop it on the list any annoy anyone.

Here are some suggestions:

1. The first monstrous "or" grouping looks like you just want to
match the subject.  As it stands, they get tested against every
header line, including Date:, Received:, etc.  So,
        :0H:
        * (5000 Diet Challenge|\
        JustSayWow|\
[337 lines snipped]
        Under-Valued Stock Alert|\
        Alco-Zyme)
will be better changed to:
        :0H:
        * ^Subject:.*(5000 Diet Challenge|\
        JustSayWow|\
[339 lines the same as before]
since it can then go the the next line without processing the
big "or" as soon as it sees headers starting with something
other than "Subject".

2. Things are case-insensitive unless you use the "D" flag, so:
        STOCK JUNGLE NEWS|\
        Stock Jungle News|\
checks twice for the same thing.  Get rid of duplicates.

3. Checking for zero-or-more anything at the start of a string
won't help you, so in your phone numbers, change:
        * ([-(]*800[ .)-]+288[ .-]+3199|\
        [-(]*208[ .)-]+955[ .-]+4105|\
...
to
        * (800[ .)-]+288[ .-]+3199|\
        208[ .)-]+955[ .-]+4105|\
...

4. Consider whether matching less is as good.  For example:
        Frequently Asked Questions--CABLE TV DESCRAMBLER|\
Would this be as good?:
        CABLE TV DESCRAMBLER|\
On the other hand, if it's always at the start of a line, it's even
better if you can left-anchor the whole thing:
        ^Frequently Asked Questions--CABLE TV DESCRAMBLER|\

5. Sort, and then factor common start sequences.  For example:
        Attracts FAST!|\
        Achieve Your Financial Dreams|\
        Anyone or Anything|\
        ANIMAL SEX|\
can be combined:
        A(ttracts FAST!|chieve Your Financial Dreams|\
         n(yone or Anything|IMAL SEX))|\
and also:
        Home business|\
        HOME BASE BUSINESS|\
can be:
        home (base )?business|\

That should help some.  There are probably more things that could
be done for speed; that's what I noticed at a glance.

Cheers,
Stan
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail