procmail
[Top] [All Lists]

Re: filtering caps-only spam

2001-01-10 11:47:37
Martin Maciaszek wrote:
...
Martin Maciaszek <mmaciaszek(_at_)gmx(_dot_)net> writes:
I recently received some spam which body contained the wohle text
almost entirely written in caps. ...
...
Let's say it's about 90%. If you want an example I could attach
the spam. (Although I don't think this would be a good idea)

I'm going to assume this means "90% of alphabetic characters are UPPER 
CASE."  Here's a starting point.  

    LOGFILE=/dev/tty
    VERBOSE=yes
    SHELL=/bin/sh

    :0b
    Caps=|tr -cd '[A-Z]' | wc -c

    :0b
    Alphas=|tr -cd '[A-Za-z]' | wc -c

    :0 hwi
    | ( set -x; MaxAllowedCaps=`expr $Alphas '*' 90 / 100`; \
        test $Caps -gt $MaxAllowedCaps )

    # If we get here, the spam was swallowed by the above recipe


I'm sure there's a more elegant way to do it using scoring, though I'm
not sure what it is.  Maybe letting each lowercase be weighted 10 and
each uppercase weighted as 1, in opposite directions?  ... h'rm, the
following is an approximation.

    LOGFILE=/dev/tty
    VERBOSE=yes
    SHELL=/bin/sh

    :0 BDhi
    * -10^1 [a-z]
    * 1^1 [A-Z]
    | : tossed because over 90% caps;

Well, that's a starting point if you want to go the "scoring" way.
The scoring method looks to have lower overhead, although a bit harder
to understand for those of us more accustomed to shell scripts.

I gave both of the above recipes a quick sanity check, but there are
no guarantees, your mileage may vary, etc.

hth
-- 
Neither I nor my employer will accept any liability for any problems
or consequential loss caused by relying on this information.  Sorry.
Collin Park                         Not a statement of my employer.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>