procmail
[Top] [All Lists]

Re: filtering recipe and local mailbox

2001-09-24 11:00:33
...
Additionally, the ^TO_ regexp is quite processor hungry, especially
when filtering list mail, it's probably better not to use it if you
can find a unique single header for that list.
...

While searching for a single header is certainly more efficient than
using ^TO or ^TO_, I wouldn't call it "quite processor hungry" unless
you were running procmail on your Apple II.  It may be relatively more
expensive in CPU-cycles, but in absolute terms it's a drop in the bucket.


I base this on having run procmail on systems ranging from a 25MHz 386SX
and a SPARCStation II to a 2 CPU Sun Ultra60, the latter being a 50K
messages/day mail and file server where recipes are used to catch macro
viruses via conditions like:

        * B ?? \
            A$?F$?(\
                    Z$?p$?c$?n$?V$?z$?U$?H$?J$?v$?d$?G$?V$?\
                            j$?d$?G$?l$?v$?b$?[g-v]|\
                    N$?l$?Y$?3$?V$?y$?a$?X$?R$?5\
            )|\
            [AQgw]$?B$?(\
                    W$?a$?X$?J$?1$?c$?1$?B$?y$?b$?3$?R$?\
                            l$?Y$?3$?R$?p$?b$?2$?[4-7]|\
                    T$?Z$?W$?N$?1$?c$?m$?l$?0$?e$?[Q-Za-f]\
            )|\
            [AEIMQUYcgkosw048]$?A$?(\
                    V$?m$?l$?y$?d$?X$?N$?Q$?c$?m$?9$?0$?\
                            Z$?W$?N$?0$?a$?W$?9$?u|\
                    U$?2$?V$?j$?d$?X$?J$?p$?d$?H$?[k-n]\
            )

with no noticable performance loss, and that's not even an optimized
regexp!**

Unless someone can produce hard number that show otherwise, I feel that
users should not hesitate to use ^TO or ^TO_ when it's the Right Thing.


That brings us to mind second comment: people should be aware of the
semantic difference between matching list mail using ^TO_ vs ^Sender:
or some other list-added header field.  The former catches messages
addressed _to_ the list while the latter catches messages _from_ the list.
The difference shows in a couple ways:
1) if someone sends a message both to you and the list, ^TO_ will match
   both copies, while ^Sender: will only catch the copy from the list.
   This occurs most often when someone replies both to the sender and
   the list.
2) messages that don't include the list address in the header (like
   some spam) won't be caught by ^TO_.

Someone people prefer the ^TO_ behavior in (1) because they want
both copies of replies to their messages to go into the list folder.
Those people will need to use ^TO_


Philip Guenther


** For those who wonder what that condition does, it matches the base64
encodings of the strings "^(_at_)VirusProtection" and "^(_at_)Security" (with
^@'s representing the NUL character).  At least one of those strings has
occured in every macro virus I've ever seen.  Though there are ways they
could be hidden, it's a fairly practical test for sites that can't ban
the emailing of files in the various MicroSoft formats.

For those who wonder how such how the regexp would be optimized, the
trick is to factor out the left side of all possibilites, handling all
the patterns that start with 'A' in one subregexp, all the cases that
start with 'Q', 'g', or 'w' in another, and the rest, [EIMUYckos048],
in a third, like this:

        * B ?? \
            A$?(\
                F$?(\
                    Z$?p$?c$?n$?V$?z$?U$?H$?J$?v$?d$?G$?V$?\
                            j$?d$?G$?l$?v$?b$?[g-v]|\
                    N$?l$?Y$?3$?V$?y$?a$?X$?R$?5\
                )|\
                B$?(\
                    W$?a$?X$?J$?1$?c$?1$?B$?y$?b$?3$?R$?\
                            l$?Y$?3$?R$?p$?b$?2$?[4-7]|\
                    T$?Z$?W$?N$?1$?c$?m$?l$?0$?e$?[Q-Za-f]\
                )|\
                A$?(\
                    V$?m$?l$?y$?d$?X$?N$?Q$?c$?m$?9$?0$?\
                            Z$?W$?N$?0$?a$?W$?9$?u|\
                    U$?2$?V$?j$?d$?X$?J$?p$?d$?H$?[k-n]\
                )\
            )|\
            [Qgw]$?(\
                B$?(\
                    W$?a$?X$?J$?1$?c$?1$?B$?y$?b$?3$?R$?\
                            l$?Y$?3$?R$?p$?b$?2$?[4-7]|\
                    T$?Z$?W$?N$?1$?c$?m$?l$?0$?e$?[Q-Za-f]\
                )|\
                A$?(\
                    V$?m$?l$?y$?d$?X$?N$?Q$?c$?m$?9$?0$?\
                            Z$?W$?N$?0$?a$?W$?9$?u|\
                    U$?2$?V$?j$?d$?X$?J$?p$?d$?H$?[k-n]\
                )\
            )|\
            [EIMUYckos048]$?A$?(\
                    V$?m$?l$?y$?d$?X$?N$?Q$?c$?m$?9$?0$?\
                            Z$?W$?N$?0$?a$?W$?9$?u|\
                    U$?2$?V$?j$?d$?X$?J$?p$?d$?H$?[k-n]\
            )
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail