At 08:30 2002-08-29 -0700, Michael J. Rensing wrote:
I would like to use procmail to perform a number of mail tasks, including
I don't directly see how stripping HTML has much of a bearing on spam
filtering, except that it makes matching text strings a bit easier on the
generic level. That doesn't really matter - converting a message to
plaintext is a perfectly normal goal with procmail, whether you're
combatting spam, or just dealing with people who think cutesy text is the
Typically, messages sent in HTML format are multipart - there's a plaintext
version of the message preceeding the HTML portion. Of course, there are
exceptions out there, but for many messages, you might find that you don't
really need to convert the HTML so much as drop that content part.
It seems to me that it should also be able to run everything
through a filter which I figure must exist somewhere. That filter would
remove all HTML coding from a message, except links that can be clicked on.
The resulting document could be a bit messy, but at least the html tags
wouldn't be cluttering up the content. Simply coded html messages would
likely come through without problems.
You could pipe it through lynx, more recent versions of which have an
option to strip HTML. Search the list archives, linked from
<http://www.procmail.org/>. Your primary limitation there will be dealing
with links that are <XA HREF="link">some text other than the real
link</XA>, which would be stripped down to the text, rather than the link
itself. When you have <XA HREF="link">link</XA> type links, you'd
obviously not have a problem in the translation.
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
procmail mailing list