On Sunday, January 19, 2003, at 07:15 PM, Louis LeBlanc wrote:
:0B:
Check for Content-Type BEFORE you do an expensive body check.
:0
* ^Content-type:(.*/<)(multipart|html)
{
:0Bf:
# You need to specify f for procmail to process the
# message AFTER sed does the replace for you
* ^ Content-type:(.*\<)html
etc...
* (<)!--
| sed -e 's/<!--/\r<!--/g'
If all yu re doing is COUNTING the <!-- I don't think you need to do
this sed at all.
:0A
{
:0B:
* -2^0
* B ? 1^0 (<)!--
* 1^1 (<)!-- # (I think)
You don't need B because you already specified B for this block. Also,
I think the syntax would be B ?? 1^1.
| formail -Y -f -A "X-Spammer: HTML Comments out the wazoo"
:0A
{ FOLDER=spam }
}
I'd like comments, feedback, etc. from anyone who has both an opinion
and more expertise in this than me . . .
I'm not qualified to comment on the scoring, but I think checking the
headers and checking for HTML before checking the body is probably good
practice.
This is what I would start with:
:0
* ^Content-type:(.*/<)(multipart|html)
{
:0Bf:
* -5^0
* 1^1 (<)!--
| formail -A"X-Spam: more than 3 HTML comments"
:0ABf
| sed -e 's/Content-type/Content-untype/g'
}
I have X-Spam set to visible, and I destroy the content-type. The
message gets munged, but not irretrievably. I notice that my
news.google dump, for example, contains three comment markers, so a
value of 5 or 6 might be better than your 3.
That Content-type substitution should be case insensitive. I forget
how to do that gracefully.
--
I said pretend you've got no money, she just laughed and said, 'Eh,
you're so funny.' I said, 'Yeah? Well I can't see anyone else smiling
in here.'
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail