I think you'll find this very difficult to do reliably with
regular expressions of any kind. Think that you will have to
deal with legit, complex tags as well (eg: <a href="http://someplace"
target=_top>Hiya doin</a>), nested tags, tags that may span multiple
lines, etc.
The actual ereg for filtering a html-tag with attributes etc I do not
worry about if there was a function like ereg_replace. Basically I could
just filter everything within the < > tags, right...? I do not want to
change the content of the e-mail, just pre parse it in to a variable so
I can do more accurate filtering. In say perl or php this would
definitely not be a problem and I take it that the eregs work the same?
If you want to go down that road, you probable want to use
one of the html -> text converters like lynx or w3m. Which
would mean piping to a script, converting, then parsing the
converted file. Certainly possible.
Ok, so there is basically no ereg/replace function within the procmail
functionality then? But your suggested solution is certainly an option!
Could anyone give just a brief example on how this would be done I'm
sure I could work out the rest.
I would appreciate it! :)
Just because of the situation you describe, I went to a
whitelist approach, and then dump _any_ html after that as
spam. This has been
*very* effective, at least for me.
Yes, you are right, that certainly is very effective! However my e-mail
filtering policy is a more conservative one and my basic ideology is
that 1) I should be able to receive e-mail from people interested in my
business or other area even if they are not on my nobounce/whitelist (I
have one as well) and 2) Many people do unfortunately write their
e-mails in html by default and the risk that someone not one the list
sends me a legitimate e-mail like that is just to high.
Thank you for your comments!
Regards,
Björn
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail