procmail
[Top] [All Lists]

Re: Messed Up Subject Line??? Help

1996-12-11 18:40:46
You mean

(*<00) %^### !!!


^^^&&(+@@ !!!

No!

I'm sorry, I could not help myself after reading the detail of your programs
below!
Please forgive me. I really am trying to learn but I just am not following
all of this.


At 04:23 PM 12/11/96 -0800, Alan Stebbens wrote:
On Wed, 11 Dec 1996, Chin Fang wrote:
You are matching PEN, pen, Pen, open, pigpen, etc.  Any subject with the
character "PEN" in either upper or lowercase.

Try this:

  :0 D:
  * ^Subject:(.*\<)?PEN(\>|$)
  Junk

This will trash mail with the subject having the word PEN, only as a
separate word, and only in uppercase.

Sorry about my very likely misunderstanding of the man page.

quoted:

     \< or \>  Match the character before or after a word.   They
               are  merely  a  shorthand for `[^a-zA-Z0-9_]', but
               can also match newlines.  Since they match  actual
               characters,  they  are  only  suitable  to delimit
               words, not to delimit inter-word space.

If so, why the ? in the (.*\<)? is necessary?  

The (.*\<)? says match anything up to a non-word character.  This
avoids matching "PEN" in the middle of another word, like these 
subjects:

   Subject: Welcome to the OPEN HOUSE
   Subject: Thanks for the STUPENDOUS going-away part!

then obviously there won't be any
possibility of having an inter-word space, since alphnumerical
character set doesn't contain any white space (blank, newline, tab ..)

What is an "inter-word" space?  By many definitions, a "word" is a
sequence of non-space characters.  By procmail's definition above, a
"word" is sequence of alphanumerics.  As far as I know, the only
context where this "word"ness is significant is with the "\<" and "\>"
psuedo-operators.

Unfortunately, both the ^TO and ^TO_ macros each use a slight different
notion of "word" (which is appropriate considering that they are
indented to match addresses, and not arbitrary English words).

Also, does the phrase "before or after a word" imply that these two
procmail extensions assume at least one white space before (or after)
the considered word already?  If not, how can the assertion "only as a
separate word" is true?

Why would you imply anything?  The text very clearly says that they are
a "shorthand" for "[^a-zA-Z0-9_]".  This means that the expression:
"\<foo\>" will only match "foo" if it occurs *between* two non-word
characters, including spaces, tags, and newlines.  This also means that
it will not match: foobar, snafoo, or foofoo.  

What may not be clear is that "\<" and "\>" match actual characters,
and not boundaries, like egrep and Perl's comparable operators do.

For example, the expression "^Subject:\<foo\>" will not match

   Subject:foo

because there is no character to match between the colon ":" and
the "f".
___________________________________________________________
Alan Stebbens <aks(_at_)sgi(_dot_)com>      http://reality.sgi.com/aks



<Prev in Thread] Current Thread [Next in Thread>