[Top] [All Lists]

Re: whole word recipe

2002-05-27 19:59:22
Steve Semple asked,

| The ? I find a bit confusing. You know even when I read regular
| expression descriptions over and over it seems even more cryptic.

It means "zero or one of the preceding" or, if you like, "[whatever] or
nothing."  (.*\<)? means "either nothing or a string that ends in a non-word

| Im confused what the difference between a space and
| a inter-word space is, they sound the same to me.

Well, the man page doesn't differentiate between a space and an inter-word
space; it says, as you quoted,

Since they match actual characters, they are only suitable
to delimit words, not to delimit inter-word space.

The distinction is between delimiting words and delimiting the space between

In egrep and perl, \< matches the *transition* from a character that wouldn't
be in a word (such as a space or a punctuation mark) to a character that would
be in a word (such as a letter or a digit), and \> matches the *transition*
from word to non-word.  In procmail, they match a non-word character; there
has to be a character there to match it (a punctuation mark, space, tab, or
newline).  So if you have

 hi there

as the text, and


as the pattern, you'd get a match in egrep or perl but not in procmail, while


would match under either interpretation.  On the other hand, in procmail you
could test for two adjacent punctuation marks with this:


while that would make no sense in egrep or perl.

So what's that stuff about delimiting words but not delimiting inter-word
space?  It's that


works in procmail as well as in perl or egrep, but in egrep or perl you use

 \>{some pattern of spaces or punctuation marks}\<

to look for a place where two words were separated by a match to that pattern,
but it wouldn't work in procmail because you'd need an additional non-word
character on each end to match \> and \<.

procmail mailing list

<Prev in Thread] Current Thread [Next in Thread>