procmail
[Top] [All Lists]

Re: matching words that are laced with html

2003-10-30 11:11:56
On Thu, 30 Oct 2003, Professional Software Engineering wrote:

ALthough I didn't spot it anywhere, I believe what Dallman is saying is 
that rather than expecting to match on the drug keyword, the fact that you 
match a lot of HTML COMMENTS in an EMAIL should be sufficient to tag it as 
spam.

I seem to recall a discussion about this a while back that suggested 
scoring based on the number of comments per line.  Something like

:0 B
* -1^1 ^.*$
*  1^1 (<!)

You could do somethign similar with HTML tags, but probably allow more 
than one per line, e.g.

:0 B
* -3^1 ^.*$
*  1^1 (<[^>]+>)

You could further attempt to count matched open/close tags as only one
rather than two, but I'm not going to try to work that out right now.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>