procmail
[Top] [All Lists]

Re: Word boundary matching ?

1997-12-29 11:29:34
Vikas Agnihotri asked,

| I am trying to come up with a procmail recipe that among other things
| should have the condition 'body does not contain a particular word'.
| Here is what I tried:

| * !B ?? \<word\>

You have fallen into the leading backslash problem, Vikas.  If the first
character of a regexp is a backslash, procmail takes it as "end of leading
whitespace" and strips it.  What you coded means "a less-than sign, then the
word, then any non-word character."  (It also prevents the less-than sign fro
being taken as a size operator.)  Unless the non-word character immediately
to the left of the word was a less-than sign, that regexp would fail (and
thus the condition would pass).  Try this:

  * ! B ?? ()\<word\>

This would work too:

 * ! B ?? \\<word\>

but in a casual reading it would look like "literal backslash, less-than
sign, the word, word boundary character," so we on the list generally
recommend the empty parentheses.

Do note that the difference in meaning of \< and \> in procmail (where they
must match a non-word character) from their meaning in perl and egrep (where
they match the zero-width transition into and out of a word respectively)
does not come into play here.  Because procmail's \< and \> can match new-
lines (both real and putatitve), it rarely is a factor.  It's a problem only
when a single character has to serve both as the ending boundary of one word
an also the opening boundary of another.  [Well, it's also a problem when you
have one as the last character to the right of \/, but that's easily solved.]

<Prev in Thread] Current Thread [Next in Thread>