You mean
(*<00) %^### !!!
^^^&&(+@@ !!!
No!
I'm sorry, I could not help myself after reading the detail of your programs
below!
Please forgive me. I really am trying to learn but I just am not following
all of this.
At 04:23 PM 12/11/96 -0800, Alan Stebbens wrote:
On Wed, 11 Dec 1996, Chin Fang wrote:
You are matching PEN, pen, Pen, open, pigpen, etc. Any subject with the
character "PEN" in either upper or lowercase.
Try this:
:0 D:
* ^Subject:(.*\<)?PEN(\>|$)
Junk
This will trash mail with the subject having the word PEN, only as a
separate word, and only in uppercase.
Sorry about my very likely misunderstanding of the man page.
quoted:
\< or \> Match the character before or after a word. They
are merely a shorthand for `[^a-zA-Z0-9_]', but
can also match newlines. Since they match actual
characters, they are only suitable to delimit
words, not to delimit inter-word space.
If so, why the ? in the (.*\<)? is necessary?
The (.*\<)? says match anything up to a non-word character. This
avoids matching "PEN" in the middle of another word, like these
subjects:
Subject: Welcome to the OPEN HOUSE
Subject: Thanks for the STUPENDOUS going-away part!
then obviously there won't be any
possibility of having an inter-word space, since alphnumerical
character set doesn't contain any white space (blank, newline, tab ..)
What is an "inter-word" space? By many definitions, a "word" is a
sequence of non-space characters. By procmail's definition above, a
"word" is sequence of alphanumerics. As far as I know, the only
context where this "word"ness is significant is with the "\<" and "\>"
psuedo-operators.
Unfortunately, both the ^TO and ^TO_ macros each use a slight different
notion of "word" (which is appropriate considering that they are
indented to match addresses, and not arbitrary English words).
Also, does the phrase "before or after a word" imply that these two
procmail extensions assume at least one white space before (or after)
the considered word already? If not, how can the assertion "only as a
separate word" is true?
Why would you imply anything? The text very clearly says that they are
a "shorthand" for "[^a-zA-Z0-9_]". This means that the expression:
"\<foo\>" will only match "foo" if it occurs *between* two non-word
characters, including spaces, tags, and newlines. This also means that
it will not match: foobar, snafoo, or foofoo.
What may not be clear is that "\<" and "\>" match actual characters,
and not boundaries, like egrep and Perl's comparable operators do.
For example, the expression "^Subject:\<foo\>" will not match
Subject:foo
because there is no character to match between the colon ":" and
the "f".
___________________________________________________________
Alan Stebbens <aks(_at_)sgi(_dot_)com> http://reality.sgi.com/aks