procmail
[Top] [All Lists]

Re: matching multiple lines

2002-04-12 16:28:28
++ 12/04/02 12:41 -0500 - David W. Tamkin:
| \< matches much more than just newline.  Use ^ or $ for that:

The type of mail Rejo is trying to match is not unlikely to have ellipses,
exclamation points, tabs, or hyphens between words.  (Newline(maybe some
spaces)|space(maybe more spaces)) probably wouldn't do the job very well.

So, putting together the suggestions here, the information from the
manpage and reflecting this to my own needs I have conclude I can best
match too much (which is more or less the same as what you, David, say
quoted above).

To recall, I'm using this to catch typical TINS [1], like:

  mailto:[a-z0-9(_dot_)-]+(_at_)[a-z0-9(_dot_)-]+\?subject=remove
  this email cannot be considered spam
  do not spam\. it hurts all of us

The last two wouldn't be caught if I use this as an regexp and the line
would be split to two lines in the spam. So, I now have replaced all the
spaces by "${RN}" and set RN to "([      ]+|((\<)+[      ]?))".

An extremely small test showed this should be working.

        -Rejo.


[1] A shorthand I have started using for spam telling me "This Is Not
    Spam".

-- 
# rejo(_at_)sisterray(_dot_)xs4all(_dot_)nl, pgp: see headers, 
http://www.xs4all.nl/~sister
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail