procmail
[Top] [All Lists]

Re: No good spamming bastards are using new tricks to get by the filters

2003-01-19 19:34:37
On 01/19/03 05:31 PM, LuKreme sat at the `puter and typed:
On Sunday, January 19, 2003, at 11:12 AM, Louis LeBlanc wrote:
Hey folks.  I have a quick question:

How long have those stupid spammers been inserting html comments into
HTML spam to sneak by filters meant to keep them out?

And more importantly, is there a relatively easy way to filter every
HTML message thru a dump to eliminate any HTML tags?  Seems if I could
do that, those messages with stuff like this wouldn't get thru
anymore:

you can strip the comments form the html before you start other checks..

| sed -e 's/<!--[^-]*-->//g

I think would do it, unless the comments break lines.

you could also do something like

| sed -e 's/<!--/\r<!--/g' \
       -e 's/-->/-->\r/'

and then

| sed -e '/<!--/,/-->/d'

I would simply count comments ("<!--" in an html message and discard if 
there are more than... oh, I dunno, some threshold.  Like 2.  maybe 3.

This amateurish html obfuscation doesn't concern me nearly as much as 
the Base64 stuff.

-- 
Love is like oxygen/You get too much/you get too high/Not enough and 
you're gonna die

Thanks a bunch for the suggestions!  I took all this into
consideration, and figured I could live with the scoring method.
Problem is I never really mastered it :|  So I went back to the
procmail site and read up on it again at
http://www.uwasa.fi/~ts/info/proctips.html which is linked from
procmail.org.  This is what I came up with:

:0B:
* (<)!--
| sed -e 's/<!--/\r<!--/g' 
:0A
{ 
  :0B:
  * -2^0
  * B ? 1^0 (<)!--
  | formail -Y -f -A "X-Spammer: HTML Comments out the wazoo"
  :0A
  { FOLDER=spam }
} 

If I'm not mistaken, this will set the threshold at 3 html comments,
and spam the message if that threshold is found.  The sed command that
inserts a carriage return before each comment should ensure that all
comments get counted individually, rather than just counting lines
with one or more comment.

I remembered to enclose the '<' character in parens to match it
properly too.

I'd like comments, feedback, etc. from anyone who has both an opinion
and more expertise in this than me . . .

Thanks a lot

Lou
-- 
Louis LeBlanc               leblanc(_at_)keyslapper(_dot_)org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     ԿԬ

When some people discover the truth, they just can't understand why
everybody isn't eager to hear it.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail