procmail
[Top] [All Lists]

Re: #$%&@ e-mails in html

1998-02-09 10:49:41
On Mon, 09 Feb 1998 07:28:38 -0500 Stan Ryckman wrote
At 10:42 AM 2/9/98 +0200, jari(_dot_)aalto(_at_)poboxes(_dot_)com wrote:
If you receive pure html text, then use this

:0 fbw
* condition-to-determine-pure-html-body
| perl -0777 -pe 's/<[^>]*>//g'

Doesn't this just get rid of <thing> </thing> stuff?
Won't it choke on < and > in the body?  Won't it still
leave absurdly long lines?  What about an HTML message
containing "C" code (for example) where < and > are
common operators?  "if(x<2 && y>3) do_something();"

Ideally, literal <'s and >'s in the body should be replaced
by &lt; and &gt;

I would think the appropriate tool would need to be something
that fully understands HTML rules and can produce sensible
text, such as "lynx -dump".

Actually, that is a great idea.  Should be easy enough to implement.
You'd even be able to follow links later, with the index at the end.

These are not simple conditions; consider, for example, a text email
which describes the rules for writing HTML.

Well-behaved messages should flag themselves, e.g. 
Content-type: text/html, or
Content-type: multipart/alternative.

Otherwise, you're SOL.

-- 
Chris Mikkelson                 
mikk0022(_at_)maroon(_dot_)tc(_dot_)umn(_dot_)edu
Microsoft: We made "reboot" a household word.

<Prev in Thread] Current Thread [Next in Thread>