procmail
[Top] [All Lists]

regexp efficiency [Was: Bill Moseley's ${$MATCH} problem]

1999-01-14 09:02:05
"David W. Tamkin" <dattier(_at_)Mcs(_dot_)Net> writes:
Right ... I was confused.  Additionally, I meant (.|$) and not [.$], which
Philip has also fixed; and as he said, it's the final value we're matching to
here, not the name of the variable, which had to be matched earlier, so it
could be null.  Then I believe this would work also:

      * $ $MATCH ?? ^^\/(.|$)*


While the regexps "^^\/(.|$)*" and "^^\/(.*$)*" are equivalent in what
they match (everything!), the former requires that the regexp engine
evaluate an alternation for every character.  I had heard in various
places that alternation is an 'expensive' operation, and this seemed
like a perfect case to test it.

So, I found the largest message in my mailbox, and created two rcfiles,
the first of which had 128 repetitions of the recipe
        :0 HB
        * ^^\/(.|$)*
        { }
while the second had 128 repetitions of the recipe
        :0 HB
        * ^^\/(.*$)*
        { }

Here's the output:

        lunen% wc message
           11688   24813 1231712 message
        lunen% /bin/time procmail DEFAULT=/dev/null rc1 < message

        real     1:38.1
        user     1:37.2
        sys         0.0
        97.21u 0.05s 1:38.14 99.1%
        lunen% /bin/time procmail DEFAULT=/dev/null rc2 < message

        real     1:06.2
        user     1:05.1
        sys         0.0
        65.15u 0.05s 1:06.24 98.4%
        lunen%

So, the alternation version takes about half-again as long.


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>