At 10:49 2003-05-25 -0600, LuKreme wrote:
[kersnip]
Why not test for Content-type? if the message isn't html then the
comments are a lot less likely to be spammish, right?
Indeed, but as yet, I haven't had a need to - this commenting style, and
the threshold I use have only been matching spam. If I participated on
more lists where it was an issue, I might reconsider, but that simply
hasn't proven to be the case yet.
Note that I have (nominal) elevations in spammishness for HTML and
multipart/alternative messages (yea, JUST BECAUSE), and higher elevations
for messages which don't have text/plain parts when they should, as well as
for HTML email (opening HTML tag at the top of the body) sent without an
appropriate content-type. All of these spammishness flags are additive.
Many spams with HTML bodies do not have a content-type header at all, so if
you wanted to qualify the filter which I posted, you might add the
following condition line:
* B ?? ^^([ ]|$)*<(!DOCTYPE )?HTML
Which would match for a body starting with an apparent HTML tag (note that
it isn't closed - the doctype tag would have additional data).
Of course, a proper HTML discussion list may be HTML email, but the HTML
code snippets would be within <CODE> tags. It isn't worth the hassle for
me to parse the messages to determine this - the above has been working
well so far, and if I really needed to, I could compensate the spammishness
for certain lists as needed.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail