procmail
[Top] [All Lists]

Re: Hung up while trying to scrub Yahoo advertisements...

1998-03-08 13:31:54
Ken Hooper asked about my previous suggestion,

| Can we anchor the string at the beginning of the line? Is it already
| anchored in some way I don't understand?

We can, and it isn't (yet).  Originally I did include a caret in the
definition of the search string, but then I got concerned about having the
opening newline extracted into $MATCH and procmail's counting an extra line
in the extracted text.  Certainly one can add yet another variable for each
system:

  LEFTANCHOR=yes
or
  LEFTANCHOR # unset

and extract as follows in the INCLUDERC:

  :0
  * $ ()${LEFTANCHOR:+^}\/$STRING(.*$)*^^
  { FOOTER=$MATCH } # or whatever I called it

and still require the string to be left-anchored without extracting the
opening newline.  But you see, if the caret were in $STRING, we couldn't
put the extraction operator between it and the rest of $STRING.

| I ask because I can imagine
| somebody forwarding a message from somebody(_at_)juno(_dot_)com and then 
adding
| trailing text. In a case like that this script would cut lots of legitimate
| text, if the string wasn't anchored, because all the trailing text would be
| evaluated as part of the footer.

At first I didn't see what forwarding had to do with left-anchoring, but I
think Ken is talking of the case where someone not on one of these systems
*quotes* a letter that originated on one without deleting the footer.  Then
the footer will be indented with citation characters and not left-anchored.

If someone not on one of these systems forwards a message with a chance to
edit it or sends his/her own follow-up to a message from one of these sites
and doesn't delete the footer, it won't match the original From: condition
anyway.  Still, it's not a bad idea to check that the footer is left-anchored
if that's the way the site inserts it, and above is a way to do it.