On 01/19/03 05:31 PM, LuKreme sat at the `puter and typed:
On Sunday, January 19, 2003, at 11:12 AM, Louis LeBlanc wrote:
Hey folks. I have a quick question:
How long have those stupid spammers been inserting html comments into
HTML spam to sneak by filters meant to keep them out?
And more importantly, is there a relatively easy way to filter every
HTML message thru a dump to eliminate any HTML tags? Seems if I could
do that, those messages with stuff like this wouldn't get thru
anymore:
you can strip the comments form the html before you start other checks..
| sed -e 's/<!--[^-]*-->//g
I think would do it, unless the comments break lines.
you could also do something like
| sed -e 's/<!--/\r<!--/g' \
-e 's/-->/-->\r/'
and then
| sed -e '/<!--/,/-->/d'
I would simply count comments ("<!--" in an html message and discard if
there are more than... oh, I dunno, some threshold. Like 2. maybe 3.
This amateurish html obfuscation doesn't concern me nearly as much as
the Base64 stuff.
--
Love is like oxygen/You get too much/you get too high/Not enough and
you're gonna die
Thanks a bunch for the suggestions! I took all this into
consideration, and figured I could live with the scoring method.
Problem is I never really mastered it :| So I went back to the
procmail site and read up on it again at
http://www.uwasa.fi/~ts/info/proctips.html which is linked from
procmail.org. This is what I came up with:
:0B:
* (<)!--
| sed -e 's/<!--/\r<!--/g'
:0A
{
:0B:
* -2^0
* B ? 1^0 (<)!--
| formail -Y -f -A "X-Spammer: HTML Comments out the wazoo"
:0A
{ FOLDER=spam }
}
If I'm not mistaken, this will set the threshold at 3 html comments,
and spam the message if that threshold is found. The sed command that
inserts a carriage return before each comment should ensure that all
comments get counted individually, rather than just counting lines
with one or more comment.
I remembered to enclose the '<' character in parens to match it
properly too.
I'd like comments, feedback, etc. from anyone who has both an opinion
and more expertise in this than me . . .
Thanks a lot
Lou
--
Louis LeBlanc leblanc(_at_)keyslapper(_dot_)org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org ԿԬ
When some people discover the truth, they just can't understand why
everybody isn't eager to hear it.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail