procmail
[Top] [All Lists]

Re: #$%&@ e-mails in html

1998-02-09 05:42:42
At 10:42 AM 2/9/98 +0200, jari(_dot_)aalto(_at_)poboxes(_dot_)com wrote:

[snip]

[to convert html to text]

If you receive pure html text, then use this

:0 fbw
* condition-to-determine-pure-html-body
| perl -0777 -pe 's/<[^>]*>//g'

Doesn't this just get rid of <thing> </thing> stuff?
Won't it choke on < and > in the body?  Won't it still
leave absurdly long lines?  What about an HTML message
containing "C" code (for example) where < and > are
common operators?  "if(x<2 && y>3) do_something();"

I would think the appropriate tool would need to be something
that fully understands HTML rules and can produce sensible
text, such as "lynx -dump".

[to kill extra attachements]

[snip]

Isn't the most "interesting" part the
* condition-to-determine-pure-html-body
you spoke of, and also a similar
 * condition-to-determine-pure-html-attachment
and one more, a:
 * condition-to-determine-html-attachment-which-duplicates-the-text
so that one can identify and deal ONLY with such mail appropriately
yet not lose or mangle information content?

These are not simple conditions; consider, for example, a text email
which describes the rules for writing HTML.

<gripe>
Personally, I consider HTML-in-email to be one of the worst "inventions"
ever inflicted on the internet; most of this I get is spam, and the
rest seems to be sent by those clueless about the difference between
email and the web and who have no idea how to turn it off.  It makes
a wonderful mess of mailing list archives, quite apart from its general
wastefulness of resources.
</gripe>

Cheers,
Stan

<Prev in Thread] Current Thread [Next in Thread>