procmail
[Top] [All Lists]

Re: [spamtools] Another pattern

1998-03-02 00:25:23
On Sun, 1 Mar 1998 19:36:28 -0500 (EST), gsutter(_at_)pobox(_dot_)com wrote:
On Tue, 24 Feb 1998, era eriksson wrote to the spamtools list:
When Speaking Of Capital Bogosity (Beginning Each Word With A Capital
Letter, Such As Is Oft Found In Spam):
MYWORDEXP="[A-Z][-a-z']+[,:;]*"
In $MYWORDEXP, why do you allow for any number of [,:;]?  I think that
actually, [,:;]? would be more appropriate, as a word terminated by more
than one of these characters probably isn't a "normal" word and should
not be treated as such.  This also is my opinion for the [,:;]* group in
the second condition line. 

Yeah, you're right, of course. I was mainly thinking of repeated major
punctuation (..., !?, etc) at some point and that just got left in.

Also on the second condition line, you may want to append [-a-z'] to the
end, to create

That Is Perhaps Not Necessarily A Good Idea.
                                ^^
This will make for a stricter match, disallowing such phrases as "that's
what I think" from matching on the "what I"; there are innumerable other
such cases.

Innumerable? Three more please then? Anyhow, the whole recipe gave
some leeway, the score would start out with -80 as you recall, to
allow for sudden bursts of Proper Names and other Stuff Like That in
normal text. (I haven't tested, it was the original poster to
spamtools who wanted it that way.) 
  The heuristic is fragile at best, I wouldn't necessarily can it as
spam based on this recipe alone (but I might move it to a secondary
folder, along with messages which match "\<r\<*u\>" and
"[a-z]\.\.+[a-z]", as in "r u sure...i cant spell...right...this l00x
so...3/_337...").

{ JFEXP="$JFSEC: Capital Bogosity" }
Comments, please.

Isn't there a clear difference between "bogosity" and "bozoticity"?
I prefer the latter. :-)

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

<Prev in Thread] Current Thread [Next in Thread>