procmail
[Top] [All Lists]

weirdness with trailing ^^ and with ^.*$

1996-07-16 21:54:32
Long story, but I was looking for a search expression that would always be
found twice, regardless of the message text.  (No, it wasn't simply to score
two points; I'm already well aware that

 * 2^0

does that job very well.  Let's just say that this post is already lengthy
without the explanation of that part.)

I tried

 * 1^1 H ?? ^(From .*)?$

thinking for sure that the header would always contain one From_ line and
one empty line, but it counted three matches and scored 3!  Why??  Note that
there is a space after "From" to prevent matching the From: header.

 * 1^1 H ?? ^^From |^$
and
 * 1^1 H ?? ^^From .*$|^$

each also scored 3.  Damn.  These, too, find three occurrences each:

 * 1^1 ^^.*$|(.*$)+^^
 * 1^1 ^^.*$|($).*^^

and I just don't understand.  There is something funny about a trailing ^^.
How can [regexp that can't be matched by null]^^ be found more than once?

Another thing I came across is that

   :0
   * 1^1 .
   * 1^1 ^.*$
   { M=$= }

results in a $M that is one byte larger than the size used for the "<" and
">" operators (and which appears in logabstracts) because 1^1 ^.*$ counts one
more line than wc -l reports.

Anyhow, I finally found a working solution.  It seems that if you search the
head, procmail thinks it ends with *two* blank lines.  These three:

 * 1^1 H ?? ^^From |($)($)^^
 * 1^1 H ?? ^^From |($)($)($)
 * 1^1 H ?? ^^From |$($)^^

each score 2.  These, though, find only one match and score 1:

 * 1^1 H ?? ^^From |($)$^^
 * 1^1 H ?? ^^From |^.*$^^

and surely enough, these don't match at all:

 * ($)$^^
 * ^.*$^^

($)$^^ doesn't match but $($)^^ does; ^.*$^^ doesn't but (^.*$)^^ does.
I think that "$^^" is yet another problem.

Now this is guaranteed to find exactly two matches (at least if the head is
one line or longer):

 * 1^1 H ?? ^^.*$|^.*$.*^^

I still don't know why these match three times each:

 * 1^1 H ?? ^^.*$|(.*$).*^^
 * 1^1 H ?? ^^.*$|($).*^^

Now this is odd: scoring on regexps is not supposed to count overlapping oc-
currences, but

 * 1^1 HOST ?? ^^.*$|^.*^^

finds two matches and scores 2!  Why?

It gets stranger.  If the variable used is unset or null, that last condition
finds an unlimited number of matches and reaches the maximum score (that's
one reason I'm using $HOST, which is never null or unset; the other reason is
that it never contains an embedded newline); why are ^^.*$ and ^.*^^ matching
an empty search area even once, let alone again and again?

And why does the following score 2?  There are no embedded newlines in $HOST,
so it should score 0!

 * 1^1 HOST ?? ($).*^^

Anyhow, I'm thoroughly puzzled by how a trailing ^^ is behaving and by why
^.*$ is found one time too many.

<Prev in Thread] Current Thread [Next in Thread>