(No, not the type they talk about in SPAM's<g>). I've read
the "self-explanatory" FAQ's, and still have problems. Here's an
example. I'll walk through it, and say how I interpret each step.
Please let me know if/when I goof.
# - Start with some threshold letters such as !
# frequently found in spam have a high score
# - Any dollar sign is likely spam
# - And a negative one for replies. Usually spam
# doesn't seem to have Re: in subject field.
]] Start recipie
:0
]] Set initial value to -250.
]] Add 200 for each match of "^Subject:.*\!\!\!"
* -250^0* 200^0 ^Subject:.*\!\!\!
]] Add 100 for each match of "^Subject:.*\!\!\!\!"
* 100^0 ^Subject:.*\!\!\!\!
]] Add 100 for matching regexp
]] "^Subject:.*\<free|sex|opportunity|money|great\>"
]]
]] Question... what is the significance of the "^1"
]] suffix versus the "^0" everywhere else? Is there
]] such a thing as "^2", "^3", etc.?
* 100^1 ^Subject:.*\<free|sex|opportunity|money|great\>
]] Add 100 for each match of "^Subject:.*\$"
* 100^0 ^Subject:.*\$
]] Subtract 250 for each match of "^Subject: *Re:"
* -250^0 ^Subject: *Re:
]] Subtract 250 for each match of "^Subject: *Fwd:"
* -250^0 ^Subject: *Fwd:
At the end of the recipie, execute the specified action if
the accumulator > 0. The value is lost if <= 0. In order to
recover the score in such a case, I have to execute...
{ }
VARIABLE = $=
...immediately after the recipie.
In addition to the significance of "^0", "^1", etc, I have the
following questions...
1) how is a "moving match" handled? E.g. will "!!!" be
considered to match "!!!!" twice? (once for first 3 characters
and once for last 3 characters?)
2) similar to 1), if a target word shows up 2, 3, or n times,
how much is it counted?
3) how many matches of .*\<free|sex|opportunity|money|great\>
(i.e. 1 or 5) would be counted in a subject like...
"Great opportunity for free sex; no money required!!!"
--
Walter Dnes (Toronto)
<waltdnes(_at_)interlog(_dot_)com>