procmail
[Top] [All Lists]

Re: Matching with \<

1997-05-30 11:54:00
Rik Kabel wrote,

| I have had some results which I do not understand while working with a 
| scoring recipe. The problem is distilled in the following two examples. 
| I am using v3.11pre4.

| In the first example, the score of 121 is computed as 2 + 4 for dong and
| the first bing(o), + 5 for dong, + 10 + 100 for bingo bingo. Since scoring
| uses a shortest match, there is no score of 8 for the second bingo. 

| ====Example 1====
| This message scored 121 on
| :0 H B
| * ::score::
| *  2^2  ^.*\<(b|d)(i|o)ng
| *  5^5  (b|d)ong
| * 10^10 bingo\>
| 
| +To: rik
| +Subject: ::score::dong
| +
| + bingo bingo
| +

Well, let's restate that last part: because scoring looks only for
non-overlapping occurrences, and the second condition is left-anchored,
only one [bd][io]ng match it per line.  So where [bd][io]ng appears twice
on a line (" bingo bingo"), procmail then scores only one of them, and it
chooses the shorter (through the first "bing", not through the second).

Anyhow, on with Rik's question:

| In example 2, the score of 11 is computed as 2 + 4 + 5 as in example 1,
| but there is no score for either bingo. 
| 
| ====Example 2====
| This message scored 11 on
| :0 H B
| * ::score::
| *  2^2  ^.*\<(b|d)(i|o)ng
| *  5^5  (b|d)ong
| * 10^10 \<bingo\>
| 
| +To: rik
| +Subject: ::score::dong
| +
| + bingo bingo
| +

Ah, the old leading backslash problem.  Procmail takes a backslash at the
start of a regexp to mean "end of whitespace".  Then it looks for "<bingo\>"
with a literal less-than sign at the beginning, which does not appear at all.
Hence the fourth condition scores zero.

Try one of these instead:

  * 10^10 ()\<bingo\>
or
  * 10^10 (\<bingo\>
or
  * 10^10 (\<)bingo\>

Then the recipe should score 10 (not 110, because the only way to find
\<bingo\> twice is to allow overlapping).

In fact,

  * 10^10 \\<bingo\>

is equivalent but highly counterintuitive (we expect "\\" to represent a
literal backslash).

This also works but is slightly slower:

 * 10^10 .*\<bingo\>

Stephen's recommendation (and general consensus of the list membership) is
to use the first method [leading with "()"].

<Prev in Thread] Current Thread [Next in Thread>