procmail
[Top] [All Lists]

Help figuring out SCORE-ing

1998-01-20 23:38:46
(No, not the type they talk about in SPAM's<g>).  I've read
the "self-explanatory" FAQ's, and still have problems.  Here's an
example.  I'll walk through it, and say how I interpret each step.
Please let me know if/when I goof.

# - Start with some threshold letters such as !
# frequently found in spam have a high score

# - Any dollar sign is likely spam

# - And a negative one for replies. Usually spam
#   doesn't seem to have Re: in subject field.

]] Start recipie
:0

]] Set initial value to -250.
]] Add 200 for each match of "^Subject:.*\!\!\!"
* -250^0* 200^0 ^Subject:.*\!\!\!

]] Add 100 for each match of "^Subject:.*\!\!\!\!"
*  100^0        ^Subject:.*\!\!\!\!

]] Add 100 for matching regexp
]] "^Subject:.*\<free|sex|opportunity|money|great\>"
]]
]] Question... what is the significance of the "^1"
]] suffix versus the "^0" everywhere else?  Is there
]] such a thing as "^2", "^3", etc.?
*  100^1        ^Subject:.*\<free|sex|opportunity|money|great\>

]] Add 100 for each match of "^Subject:.*\$"
*  100^0        ^Subject:.*\$

]] Subtract 250 for each match of "^Subject: *Re:"
* -250^0        ^Subject: *Re:

]] Subtract 250 for each match of "^Subject: *Fwd:"
* -250^0        ^Subject: *Fwd:

  At the end of the recipie, execute the specified action if
the accumulator > 0.  The value is lost if <= 0.  In order to
recover the score in such a case, I have to execute...

{ }
VARIABLE = $=

...immediately after the recipie.
  In addition to the significance of "^0", "^1", etc, I have the
following questions...
  1) how is a "moving match" handled?  E.g. will "!!!"  be
considered to match "!!!!" twice? (once for first 3 characters
and once for last 3 characters?)
  2) similar to 1), if a target word shows up 2, 3, or n times,
how much is it counted?
  3) how many matches of .*\<free|sex|opportunity|money|great\>
(i.e. 1 or 5) would be counted in a subject like...
"Great opportunity for free sex; no money required!!!"

-- 
Walter Dnes (Toronto)
<waltdnes(_at_)interlog(_dot_)com>


<Prev in Thread] Current Thread [Next in Thread>