procmail
[Top] [All Lists]

Re: [spamtools] Another pattern

1998-03-01 17:49:20
On Tue, 24 Feb 1998, era eriksson wrote to the spamtools list:

When Speaking Of Capital Bogosity (Beginning Each Word With A Capital
Letter, Such As Is Oft Found In Spam):

   MYWORDEXP="[A-Z][-a-z']+[,:;]*"

   :0D
   *  -80^0
   *    1^1 B ?? [-a-z'][,:;]*[       ]+[A-Z]
   * $  8^1 B ?? ()\<$MYWORDEXP[      ]+$MYWORDEXP[   ]+$MYWORDEXP
   { SPAMMER="Capital Bozoticity" }

In $MYWORDEXP, why do you allow for any number of [,:;]?  I think that
actually, [,:;]? would be more appropriate, as a word terminated by more
than one of these characters probably isn't a "normal" word and should
not be treated as such.  This also is my opinion for the [,:;]* group in
the second condition line. 

Also on the second condition line, you may want to append [-a-z'] to the
end, to create

* 1^1 B ?? [-a-z'][,:;]*[       ]+[A-Z][-a-z']

This will make for a stricter match, disallowing such phrases as "that's
what I think" from matching on the "what I"; there are innumerable other
such cases.

With these modifications and the elimination of the variable (personal
taste), my recipe, in test form, looks like this:

:0D
* -10^0
* 1^1 B ?? [-a-z'][,:;]?[        ]+[A-Z][-a-z']
* 8^1 B ?? ()\<[A-Z][-a-z']+[,:;]?[     ]+[A-Z][-a-z']+[,:;]?[  
]+[A-Z][-a-z']+[,:;]?
{ JFEXP="$JFSEC: Capital Bogosity" }

Comments, please.

GReg
-- 
Gregory S. Sutter                       "How do I read this file?"
mailto:gsutter(_at_)pobox(_dot_)com                "You uudecode it."
http://www.pobox.com/~gsutter/          "I I I decode it?"


<Prev in Thread] Current Thread [Next in Thread>