Martin Maciaszek wrote:
Martin Maciaszek <mmaciaszek(_at_)gmx(_dot_)net> writes:
I recently received some spam which body contained the wohle text
almost entirely written in caps. ...
Let's say it's about 90%. If you want an example I could attach
the spam. (Although I don't think this would be a good idea)
I'm going to assume this means "90% of alphabetic characters are UPPER
CASE." Here's a starting point.
Caps=|tr -cd '[A-Z]' | wc -c
Alphas=|tr -cd '[A-Za-z]' | wc -c
| ( set -x; MaxAllowedCaps=`expr $Alphas '*' 90 / 100`; \
test $Caps -gt $MaxAllowedCaps )
# If we get here, the spam was swallowed by the above recipe
I'm sure there's a more elegant way to do it using scoring, though I'm
not sure what it is. Maybe letting each lowercase be weighted 10 and
each uppercase weighted as 1, in opposite directions? ... h'rm, the
following is an approximation.
* -10^1 [a-z]
* 1^1 [A-Z]
| : tossed because over 90% caps;
Well, that's a starting point if you want to go the "scoring" way.
The scoring method looks to have lower overhead, although a bit harder
to understand for those of us more accustomed to shell scripts.
I gave both of the above recipes a quick sanity check, but there are
no guarantees, your mileage may vary, etc.
Neither I nor my employer will accept any liability for any problems
or consequential loss caused by relying on this information. Sorry.
Collin Park Not a statement of my employer.
procmail mailing list