procmail
[Top] [All Lists]

Re: Scope of Header Parsing

2003-04-02 12:32:45
Thanks all for the responses.  My reason for asking is
that I think I am going to begin trapping, or severely
weighting for spam, messages that contain html only;
i.e. they are not multipart with plain text as well.

The logic behind this is that most legit email has
a plain text, alternative, version.  From the last
two threads you would notice that I already kill
messages who's only content is base64 encoded html
(the only reason to do so is for spammers to mask
the message from content filtering).  I am now
thinking that I should both simplify and extend the
rule from:

* (^Content-Type: +text/html.*^Content-Transfer-Encoding: +BASE64)

or

* (^Content-Type: +text/(plain|html).*(^.*)?^Content-Transfer-Encoding: +base64)

to

* (^Content-Type: +text/html)

and

* (^Content-Type: +text/plain.*(^.*)?^Content-Transfer-Encoding: +base64

Thus killing or heavily discriminating against html only or
base64 encoded plain text messages.

The motivation behind doing this is an increasing number or html
only spam emails that intentionally garble the code for plain text
scanning by inserting "<!--9dARSFiz[,AdSR8F0,sz-->" crap inline
with most of the words (similar to the base64 encoding trick).

Another possibility would be to have something like John's
sanitizer 'cleanup' the html and tack on a plain text version
that can be properly scanned.

Any other ideas to beat this new spam obtusification method?

--
Daryle A. Tilroe


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>