procmail
[Top] [All Lists]

Re: Pattern matching question

1997-06-18 18:35:00
Dan Kanagy's been bitten by the leading backslash annoyance:

| I'm trying to match text in the body of e-mail that begins with "$B"
| and ends in either "(B" or "(J" on a single line.  The "$B" may or may
| not start the line and the "(B" or "(J" may or may not end the line.
| 
| I believe I need to escape the "$" and the "(", so I've tried
| 
|   :0 BD
|   * \$B.*(\(B|\(J)
| 
| but I don't get a match with this.  What might I be doing wrong?

The problem is a bit of counterintuitive operation when you want to escape
the first character in a regexp.  Procmail takes an opening backslash to
mean "end of whitespace" and strips it.  Thus it looks for this expression:

   $B.*(\(B|\(J)

and regards the opening "$" to mean "newline" rather than "dollar sign".

In fact,

  \\$B.*(\(B|\(J)

would work, though to our eyes we'd expect "\\" to match a literal backslash
in the text.  As I said, the situation with opening backslashes is highly
counterintuitive.

The general solution is to protect the beginning of the regexp with "()"
[also, you might as well put the literal left parenthesis outside the
alternation, because it's part of both "(B" and "(J"]:

  * ()\$B.*\((B|J)

which simplifies further to this:

  * ()\$B.*\([BJ]

Dollar signs are a special problem, because "$" interpretation can also
affect them, making them represent newlines when you thought they'd be
literal.  Fortunately, there are ways to tame them:

  [$] always matches a literal dollar sign in the search area.
  ($) always matches a newline in the search area.

<Prev in Thread] Current Thread [Next in Thread>