procmail
[Top] [All Lists]

Re: Something like ereg_replace?

2003-07-04 21:23:49
On Fri, Jul 04, 2003 at 05:08:13PM -0700, Bj?rn Lilja wrote:

What I would need is the ereg's to work on the e-mail contents as
usual but only after removing anything within < and > tags,
basically doing something like ereg_replace("<.*>", "") and then do
the normal eregs! This is of course to try to make it harder to do
simple filter avoidance by typing "F<dsd>R<fdf>EE" instead of
"FREE".

I think you'll find this very difficult to do reliably with regular
expressions of any kind. Think that you will have to deal with legit,
complex tags as well (eg: <a href="http://someplace"; 
target=_top>Hiya doin</a>), nested tags, tags that may span multiple 
lines, etc.

If you want to go down that road, you probable want to use one of the
html -> text converters like lynx or w3m. Which would mean piping to a
script, converting, then parsing the converted file. Certainly
possible.

Just because of the situation you describe, I went to a whitelist
approach, and then dump _any_ html after that as spam. This has been
*very* effective, at least for me.

-- 
Hal Burgiss
 

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>