procmail
[Top] [All Lists]

Re: [pro] Re: Wierd error in my logs

2008-12-05 20:57:39
On Fri, Dec 05, 2008 at 08:21:11PM -0500, Charles Gregory wrote:

On Fri, 5 Dec 2008, LuKreme wrote:
On Dec 5, 2008, at 8:23, Charles Gregory <cgregory(_at_)hwcn(_dot_)org> 
wrote:
On Fri, 5 Dec 2008, LuKreme wrote:
  * ^To:(.*[^-a-zA-Z0-9_.])?\/.*
Well, firstly, I don't like this string. It can stop in so
many arbitrary places. A single or double quote before a
name, or the "<" before an address.... what's the intent?

Er... it doesn't stop on quotes or <'s.  I direct you to the
log I posted:

Well, literally it reads as "a string of zero or more random
characters ending with a character that is not alphanumeric, or
an underscore, dash or period". So it would tend to strip off a
leading quote mark, though not much else....

I am still curious as to what your intent for that element would
be?

You're both wrong.  :-)

First of all, "[^-a-zA-Z0-9_.]" is nearly the same as the
procmail macro, "\<" or "\>" (both of which mean the same
thing, so can be used interchangeably with each other):

  \< or \>  Match  the character  before  or  after a  word.
            They are merely a shorthand for `[^a-zA-Z0-9_]',
            but can  also match newlines.  Since  they match
            actual  characters, they  are  only suitable  to
            delimit words, not to delimit inter-word space.

So he merely added "-" and "." to that to make his own
"between-words" regex.

Kreemy is almost right, because the string is useful in
things like this:

   * To:(.*\<)?dman@

It will match any of

   To:dman@
   To: dman@
   To: "dman@
   To: <dman@
   To: ("dman@
   To: "Dallman Ross" <dman@

and so on, including with multiple spaces after the colon, tabs
instead of spaces, or no whitespace at all (which is legal per
RFCs).

His regex will do nearly the same thing here.

And the good thing is, neither his regex nor the built-in macro
will match (falsely) on --

   To: goodman@
   To: john.goldman@

and so on.  So it's very useful, indeed.

The problem is that he is using a match token thereafter followed
by ".*".  So this --

   * To:(.*\<)?\/.*

or also what he used -- if, for example, there are five spaces
after the colon, will give him this match as an example:

   "     John Public <j(_dot_)q(_dot_)public(_at_)example(_dot_)com"

He won't lose the spaces, even though he thought that's what he
was doing and why he coded it that way.

The reason is that after the match token, procmail is no longer
parsimonious with matching.  Yet, it will do as little as it
possibly can to the left of the match token.  So "zero or one
of this regex" (on account of the "?") will match on zero instances
as long as it can.  There is nothing to the right of the token
to preclude that from happening.

I have actually already corrected Kreemy on this once before.
That was a year or three ago.  He said he'd redeem himself.  But
he didn't. :-)

He wants to use this:

  * To:.*\/[^   ].*

(In the brackets are a caret, a space, and a tab.)

I would do it with a regex, though, to avoid losing the
tab in editing:

   TAB = '      '
   SP  = ' '
   WS  = $SP$TAB

   * To:.*\/[^$WS].*

Dallman
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>