On Fri, Jul 04, 2003 at 05:08:13PM -0700, Bj?rn Lilja wrote:
What I would need is the ereg's to work on the e-mail contents as
usual but only after removing anything within < and > tags,
basically doing something like ereg_replace("<.*>", "") and then do
the normal eregs! This is of course to try to make it harder to do
simple filter avoidance by typing "F<dsd>R<fdf>EE" instead of
"FREE".
I think you'll find this very difficult to do reliably with regular
expressions of any kind. Think that you will have to deal with legit,
complex tags as well (eg: <a href="http://someplace"
target=_top>Hiya doin</a>), nested tags, tags that may span multiple
lines, etc.
If you want to go down that road, you probable want to use one of the
html -> text converters like lynx or w3m. Which would mean piping to a
script, converting, then parsing the converted file. Certainly
possible.
Just because of the situation you describe, I went to a whitelist
approach, and then dump _any_ html after that as spam. This has been
*very* effective, at least for me.
--
Hal Burgiss
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail