procmail
[Top] [All Lists]

Re: HTML mailer bouncing

1997-10-01 11:26:04
Date: Wed, 1 Oct 1997 17:56:06 +0300 (EET DST)
From: Kimmo Jaskari <kimmo(_at_)alcom(_dot_)aland(_dot_)fi>
Subject: HTML mailer bouncing

Alternatively, if someone knows a good way to strip the html tags from a
message and turn it into plain old text I would appreciate tips.

This Perl filter is a start.

  #! /usr/bin/perl -w
  
  ###
  ### Simple HTML stripper.
  ###
  
    require 5.002 ;
  
    ## Read text.
    undef $/ ;
    $_ = <> ;
  
    ## Extract body element if present.
    s|.*<body.*?>(.*)</body.*?>.*|$1|is ;
  
    ## Strip all tags.
    s/<!--.*?-->//gs ;    # comments
    s/<.*?>//gs ;         # regular tags
  
    ## Convert escapes.
    s/&lt;/</g ;
    s/&rt;/>/g ;
    s/&amp;/&/g ;
  
    ## Clean up vertical whitespace.
    s/^\s*\n// ;          # top
    s/\n\s*\n/\n\n/g ;    # middle
    s/\s*\n\s*$/\n/ ;     # bottom
  
    ## Write stripped text.
    print ;

If you want to get fancy, you could try to interpret some of the tags
and maybe wrap text.
-- 
Why do they call it rush hour when nothing moves?  -- Mork

<Prev in Thread] Current Thread [Next in Thread>