procmail
[Top] [All Lists]

counting repetitions on a line

1997-06-18 17:13:00
W. Wesley Groleau wrote,

| About new-lines (or lack of), I guess I did not communicate clearly.  I
| was thinking that a weighting scheme that tried to add demerits for
| each regexp would add only once for a single line that had five matches
| in it.  Is this the case?
| 
| If so, then 
| 
| :0B
| * -80^1 
| * 100^1 money
| 
| would not catch a message that had two blank lines followed by
| MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY MONEY 

If what you want to count is lines containing the word "money",

  * 100^1 ^.*money.*$

though actually this should be sufficient (because overlapping matches are
not scored):

  * 100^1 ^.*money

You should score only 100, not 1300, demerits for a line containing the word
"money" thirteen times.

But your original question, as I recall, was what happens if a spammer has no
concept of text lines and sends hard line breaks only between paragraphs.
Well, then each paragraph gets counted as a single line, because a line is
the text between two newline characters.  So if the spammer goes on for
eighteen screenfuls saying "MONEY MONEY MONEY" without hard line breaks but
only screen wrapping, a recipe that scores one hundred demerits for every
line containing the string "money" will score 100 only once, not 100 for each
physical line on your screen needed to display that line of text.

<Prev in Thread] Current Thread [Next in Thread>