procmail
[Top] [All Lists]

Re: Hung up while trying to scrub Yahoo advertisements...

1998-03-07 10:39:40
Ken Hooper wrote,

| But of course we don't want to truncate the file at any line of dashes or
| underscores (people might use sig files that trip the filter, or draw ASCII
| art--we get a lot of ASCII art on automotive lists).
| 
| So I want to look for the telltale advertising text (DO YOU YAHOO!?), and
| then back up one line and truncate the file there. Or equivalent. Of course
| we only do this if procmail knows it's from a freemail server:
| 
| * From:.*yahoo.com|hotmail.com|rocketmail.com|etc.com
| 
| and we know from past experience exactly what the text is going to be so we
| know what to search for.

Well, let's see.  (Oh, Juno adds an ad now too.)

 :0
 * ^From:.*@([^ ]\.)*\/(yahoo|(hot|rocket)mail|etc|juno)\.com\>
 {
  :0
  * MATCH ?? ^^(yahoo|rocketmail)
  { STRING=" *DO YOU YAHOO\?" # do not include opening left anchor
    DIVIDER=1 # to remove divider line above $STRING as well
  }
  :0E
  * MATCH ?? ^^hotmail
  { STRING=whatever DIVIDER=1 }
  :0E
  * MATCH ?? ^^juno
  { STRING=whatever DIVIDER=1 }
  :0E
  * MATCH ?? ^^etc
  { STRING=whatever DIVIDER=whatever # 0 or 1, I don't know
  }
  :0E # then we shouldn't be in this outer brace
  { ESCAPE=on }

  :0
  * ! ESCAPE ?? ^^on^^
  { INCLUDERC=.striptagrc }

  ESCAPE # unset for other uses
 }
  
Now, in .striptagrc, we do this:

 :0B # copy the footer into $MATCH
 * $ ()\/$STRING(.*$)*^^
 { }

 :0Bbfwi # do not use `r' flag here if you will be saving in mbox format!
 * 1^1 ^.*$
 * -1^1 MATCH ?? ^.*$
 * $ -${DIVIDER:-0}^0
 | head -$= # if you don't have head, use      sed "$= q"

The logic is like this: we take the number of lines in the body, subtract
the number of lines in the footer, subtract one more if the footer has a
divider above it (alternatively we could include the divider in the defini-
tion of the search string), and cut the body off after that many lines by
running it through head or, if your system doesn't have head, through sed.
Be careful not to change the action line to include any characters from
$SHELLMETAS, because then you'll lose the use of $=.  (If you must, save
$= in a regular variable first and then use the regular variable.)