procmail
[Top] [All Lists]

RE: Something like ereg_replace?

2003-07-05 14:18:22
I think you'll find this very difficult to do reliably with 
regular expressions of any kind. Think that you will have to 
deal with legit, complex tags as well (eg: <a href="http://someplace"; 
target=_top>Hiya doin</a>), nested tags, tags that may span multiple 
lines, etc.

The actual ereg for filtering a html-tag with attributes etc I do not
worry about if there was a function like ereg_replace. Basically I could
just filter everything within the < > tags, right...? I do not want to
change the content of the e-mail, just pre parse it in to a variable so
I can do more accurate filtering. In say perl or php this would
definitely not be a problem and I take it that the eregs work the same?

If you want to go down that road, you probable want to use 
one of the html -> text converters like lynx or w3m. Which 
would mean piping to a script, converting, then parsing the 
converted file. Certainly possible.

Ok, so there is basically no ereg/replace function within the procmail
functionality then? But your suggested solution is certainly an option!
Could anyone give just a brief example on how this would be done I'm
sure I could work out the rest.
I would appreciate it! :)

Just because of the situation you describe, I went to a 
whitelist approach, and then dump _any_ html after that as 
spam. This has been
*very* effective, at least for me.

Yes, you are right, that certainly is very effective! However my e-mail
filtering policy is a more conservative one and my basic ideology is
that 1) I should be able to receive e-mail from people interested in my
business or other area even if they are not on my nobounce/whitelist (I
have one as well) and 2) Many people do unfortunately write their
e-mails in html by default and the risk that someone not one the list
sends me a legitimate e-mail like that is just to high.

Thank you for your comments!

Regards,
Björn



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


<Prev in Thread] Current Thread [Next in Thread>