procmail
[Top] [All Lists]

Re: how to match against first N lines/bytes of body?

1997-11-13 15:27:27
On Nov 13,  8:41pm, J. Daniel Smith wrote:
Subject: Re: how to match against first N lines/bytes of body?
Adam Grove writes on 13 November 1997 at 13:18:29
When I have recipes search for patterns in the body of a message,
I'd like to limit how many lines (or bytes, it doesn't really
matter much) into the message the search is performed.
[...]
The other problem, though, is that my spam filtering recipe often
gets false positives on long mail, because if genuine mail is *long
[...]
have the right statistical properties. I'm convinced that simply
filtering just on the first 50 or so lines would be much better.

If that's the case, you could simply limit your spam filtering to
those messages that are less than N lines
   N=50
   :B
   * $${N}^0
   * -1^1 ^.+$
   { ... process message of < N lines ... }
but you said either N lines or N bytes so make that simply
   N=5000
   :B
   * $ < ${N}
   { ... process message of about < N bytes ... }

Hi,
        You're missing the point.  It's not that 50 lines is the length limit
for a spam, it's that he only wants to process 50 lines of a message, because
he can tell in 50 lines, and processing more than 50 lines can give false
positives.
        Based on my experience, he'd pass 90% of the spam by just applying
filters to messages shorter than 50 lines.
        That being the case, I would think that using head, which takes the
first N lines of a file, could be used to reduce the message length, and then
you could do your stats on the subset of the message.

-- 
--
Matthew Saroff
Do not reply directly to this message.  Reply to
msaroff(_at_)pobox(_dot_)com