procmail
[Top] [All Lists]

Re: No good spamming bastards are using new tricks to get by the filters

2003-01-20 03:00:50
On Sunday, January 19, 2003, at 07:15 PM, Louis LeBlanc wrote:
:0B:

Check for Content-Type BEFORE you do an expensive body check.

:0
* ^Content-type:(.*/<)(multipart|html)
{
  :0Bf:
# You need to specify f for procmail to process the
# message AFTER sed does the replace for you

  * ^ Content-type:(.*\<)html

etc...

* (<)!--
| sed -e 's/<!--/\r<!--/g'
If all yu re doing is COUNTING the <!-- I don't think you need to do this sed at all.
:0A
{
  :0B:
  * -2^0
  * B ? 1^0 (<)!--
   * 1^1 (<)!-- # (I think)

You don't need B because you already specified B for this block. Also, I think the syntax would be B ?? 1^1.

  | formail -Y -f -A "X-Spammer: HTML Comments out the wazoo"
  :0A
  { FOLDER=spam }
}

I'd like comments, feedback, etc. from anyone who has both an opinion
and more expertise in this than me . . .

I'm not qualified to comment on the scoring, but I think checking the headers and checking for HTML before checking the body is probably good practice.

This is what I would start with:

:0
* ^Content-type:(.*/<)(multipart|html)
{
  :0Bf:
  * -5^0
  * 1^1 (<)!--
  | formail -A"X-Spam: more than 3 HTML comments"
  :0ABf
  | sed -e 's/Content-type/Content-untype/g'
}

I have X-Spam set to visible, and I destroy the content-type. The message gets munged, but not irretrievably. I notice that my news.google dump, for example, contains three comment markers, so a value of 5 or 6 might be better than your 3.

That Content-type substitution should be case insensitive. I forget how to do that gracefully.

--
I said pretend you've got no money, she just laughed and said, 'Eh, you're so funny.' I said, 'Yeah? Well I can't see anyone else smiling in here.'


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail