On Sun, 29 Feb 2004, Robert Krueger wrote:
:0HB:
* ^Content-Type:.*text/html
blue-spam
Two questions. Why did you put an asterisk before the "text/html"?
The important thing is that it's after the "." -- ".*" means match any
number of repetitions (including zero) of anything. The "*" is what
means "any number of repetitions" of whatever came before.
Also, I was told there's some kind of procmail bug that doesn't like the
"HB" right after the "0" ( :0HB: )
There's a bug with "H" specifically -- in some versions of procmail, once
the "H" flag has appeared it is never cleared again, so all subsequent
recipes act as though they also have "H".
Instead, I was advised to use an alternate format like this: (I think)
:0 :
* HB ^Content-Type:.*text/html
blue-spam
Is that correct?
The idea is right but the syntax is wrong.
:0 :
* HB ?? ^Content-Type:.*text/html
blue-spam
Although I think that's a bit extreme as a condition all by itself, as
there are any number of ordinary email applications that might generate
HTML as a body part. Or someone might write an ordinary sentence that
mentions "Content-Type: text/html" and happens to wrap such that the
phrase lands at the beginning of a line. You could avoid misclassifying
the latter with a scoring recipe:
:0 :
* -1^0
* 2^0 H ?? ^Content-Type:.*text/html
* 1^0 H ?? ^Content-Type:.*multipart
* 1^0 B ?? ^Content-Type:.*text/html
blue-spam
This means:
Start with a negative score. If text/html is in the header, add 2 to the
score. If multipart is in the header, add 1 to the score. If text/html
is in the body, add 1 to the score. Thus only messages either that have
text/html in the header, or that have BOTH multipart in the header AND
text/html in the body, have a positive score and so are a match.
The 2^0 could be replaced with a very large score (see "man procmailsc"
for the actual maximum score -- a common idiom is to write 9876543210^0)
to short-circuit the scoring at that point and thus avoid the body scan.
The "H ??" are actually not needed as that's the default. And if the
message has no Content-Type at all you can skip the whole thing. So an
"optimized" version might be:
:0 :
* ^Content-Type:\/.*
* 9876543210^0 MATCH ?? text/html
* -9876543210^0 ! MATCH ?? multipart
* 9876543210^0 B ?? ^Content-Type: text/html
blue-spam
Clear as mud?
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail