procmail
[Top] [All Lists]

\/ is not the same as ()\/

1997-11-15 11:32:28
When I suggested,

| >   * ^^\/(.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$)?\
| >         (.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$.*$)?\
| >         (.*$.*$.*$.*$.*$.*$.*$.*$)?(.*$.*$.*$.*$)?(.*$.*$)?(.*$)?
| >   { toplines=$MATCH }

James Waldby wrote,

| I think .* would be good enough at the end there, instead of (.*$)?

It might, yes, but in more recent versions of procmail that do not strip
trailing newlines from $MATCH, we'd have the anomalous case that, if the
number of lines matched (depending on how long the body is if it is not fifty
full lines) needed the pattern for only one line to total right, $MATCH would
not end in a newline, while if it did not need to use that pattern, $MATCH
would end in a newline.

| -- possibly a case where "that trailing .* is [not] a waste"

To the right of \/, .* is not a waste.  When we say it is a waste, we're
speaking of a regexp that contains no extraction operator.

| Do you have a reasonable explanation for anchoring with  ^^ ?

1. To make it clear to the reader that we're starting from the beginning of
the body, but more importantly, 

2. To squirm out of the leading backslash problem.  ()\/ would have done that
too, but ^^\/ also takes care of #1.

| In some tests I ran via "formail < mmm  -s procmail"
| with a batch of 8 messages in mmm,
| * ^^\/.*$?.*$?.*$?.*$?.*$?.*$?.*$?  matched all 8 messages but
| * \/.*$?.*$?.*$?.*$?.*$?.*$?.*$? only matched 6 of them.

Well, for one thing, .*$? is not the same as (.*$)?, but you used .*$? in
both of these, so that isn't the problem here.  The actual reason for the
difference is that your second condition falls afoul of the leading backslash
problem.  An opening backslash in a condition means "no more leading white-
space or modifiers to strip; this is the start of the regexp."  It is intend-
ed principally for cases where the regexp starts with whitespace that counts
(the first character of the regexp is a space or a tab and you mean it as
part of the pattern), exclamation points, or other things that procmail might
strip or take as a modifier.

So \/ with nothing to the left of it means "one foreslash".  To start a con-
dition with the extraction operator, use ()\/ or \\/; the latter looks coun-
terintuitively like "literal backslash and literal foreslash" (as it would
mean if it appeared farther along in the regexp), so most of us prefer the
former.

If \/stuff matched only six of the eight messages, only six of them con-
tained "/stuff".  Try these instead and see if they match all eight:

  * ()\/.*$?.*$?.*$?.*$?.*$?.*$?.*$?
or
  * \\/.*$?.*$?.*$?.*$?.*$?.*$?.*$?