Re: piping to perl

Joel Bremson <staffjb(_at_)hooked(_dot_)net> writes:

I'd like to use the following perl script to read a message body and
return true or false depending on the program outcome. I have a few
questions on how to do this and would appreciate any advice on how to
optimize the script. 

The idea is to define spam by multiple occurences of $, while ignoring 
any message that is perl or procmail related. (Would it be more efficient
to check for (perl|procmail) before going to the perl script?)


Actually, it would be most efficient to do it totally in procmail using
scoring recipes.  However, to comment on your perl:

...

       $count += s/\$/\$/g;
       $brackets += s/\{/\{/g;


Using the tr or y operator would be more efficient than the s operator, as
it doesn't require the regexp engine, but only a table lookup.

        $count += tr/$/$/;
        $brackets += tr/{/{/;

       if ($_=~/(perl|procmail)/i){
                exit(0)
       }


The more idiomatic perl would probably be:

        exit(0) if /\bperl5?\b/i || /\bprocmail\b/i;

note that you probably want the word boundary regexp "\b" in those to
avoid matching "superlative", as in "this superlative money making
deal", and believe it or not, but the separate searches were faster
according to the perl FAQ last time I checked (though that _has_ been a
while...).

As for the recipe to test the exitcode of the perl script:

:0Bf:
? | perlscript
spam


There shouldn't be a '|' in there, and you need a leading '*' for procmail
to know that it's a condition, not an action.  You also don't want the 'f'
flag (this isn't a recipe that modifies the message being processed).

        :0 B:
        * ? perlscript
        spam


Anyway, here's a stab at the pure procmail version:

        :0 B:
        * ! ()\<(perl[0-9]*?|procmail)\>
        * ! {(.*$)*{
        * -5^0
        * 1^1 [$]
        spam

The first condition guarantees that message doesn't contain the words
perl, perl5, or procmail; the second makes sure it doesn't contain two
braces; the third weights the scoring to -5 and the last adds one to
the score for each literal dollarsign, so that the score will be
positive only if there are more than five dollarsigns.  The leading
parens on the first condition are to avoid procmail's non-intuitive
handling of leading backslashes on conditions.

Philip Guenther