procmail
[Top] [All Lists]

Re: Embedded comments

2003-05-25 12:43:09
At 10:49 2003-05-25 -0600, LuKreme wrote:
[kersnip]

Why not test for Content-type? if the message isn't html then the comments are a lot less likely to be spammish, right?

Indeed, but as yet, I haven't had a need to - this commenting style, and the threshold I use have only been matching spam. If I participated on more lists where it was an issue, I might reconsider, but that simply hasn't proven to be the case yet.

Note that I have (nominal) elevations in spammishness for HTML and multipart/alternative messages (yea, JUST BECAUSE), and higher elevations for messages which don't have text/plain parts when they should, as well as for HTML email (opening HTML tag at the top of the body) sent without an appropriate content-type. All of these spammishness flags are additive.

Many spams with HTML bodies do not have a content-type header at all, so if you wanted to qualify the filter which I posted, you might add the following condition line:

* B ?? ^^([     ]|$)*<(!DOCTYPE )?HTML

Which would match for a body starting with an apparent HTML tag (note that it isn't closed - the doctype tag would have additional data).

Of course, a proper HTML discussion list may be HTML email, but the HTML code snippets would be within <CODE> tags. It isn't worth the hassle for me to parse the messages to determine this - the above has been working well so far, and if I really needed to, I could compensate the spammishness for certain lists as needed.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>