When I have recipes search for patterns in the body of a message,
I'd like to limit how many lines (or bytes, it doesn't really
matter much) into the message the search is performed.
That is, I'd like every match appearing after some line N, for
a value of N I specify, to just be ignored.
Does anyone have a (hopefully efficient) way of doing this?
In case anyone is interested, I actually want to do this for a couple
of reasons. Partly for efficiency; I have a lot of recipes, including
an enormous scoring recipe to catch spam, and I get a little bit
alarmed at the thought of huge pieces of mail being searched in their
entirety hundreds of times before being delivered.
The other problem, though, is that my spam filtering recipe often
gets false positives on long mail, because if genuine mail is *long
enough* it will generally hit enough of the keywords so that it looks
like spam. I can try to remedy this by adjusting the score by a factor
depending on the number of lines in the messages, and by playing with
the decay rates of the scoring recipes. This works most of the time,
but I'm doubtful that any combination of these factors will actually
have the right statistical properties. I'm convinced that simply
filtering just on the first 50 or so lines would be much better.
Thanks in advance,