On Tue, 24 Feb 1998, era eriksson wrote to the spamtools list:
When Speaking Of Capital Bogosity (Beginning Each Word With A Capital
Letter, Such As Is Oft Found In Spam):
MYWORDEXP="[A-Z][-a-z']+[,:;]*"
:0D
* -80^0
* 1^1 B ?? [-a-z'][,:;]*[ ]+[A-Z]
* $ 8^1 B ?? ()\<$MYWORDEXP[ ]+$MYWORDEXP[ ]+$MYWORDEXP
{ SPAMMER="Capital Bozoticity" }
In $MYWORDEXP, why do you allow for any number of [,:;]? I think that
actually, [,:;]? would be more appropriate, as a word terminated by more
than one of these characters probably isn't a "normal" word and should
not be treated as such. This also is my opinion for the [,:;]* group in
the second condition line.
Also on the second condition line, you may want to append [-a-z'] to the
end, to create
* 1^1 B ?? [-a-z'][,:;]*[ ]+[A-Z][-a-z']
This will make for a stricter match, disallowing such phrases as "that's
what I think" from matching on the "what I"; there are innumerable other
such cases.
With these modifications and the elimination of the variable (personal
taste), my recipe, in test form, looks like this:
:0D
* -10^0
* 1^1 B ?? [-a-z'][,:;]?[ ]+[A-Z][-a-z']
* 8^1 B ?? ()\<[A-Z][-a-z']+[,:;]?[ ]+[A-Z][-a-z']+[,:;]?[
]+[A-Z][-a-z']+[,:;]?
{ JFEXP="$JFSEC: Capital Bogosity" }
Comments, please.
GReg
--
Gregory S. Sutter "How do I read this file?"
mailto:gsutter(_at_)pobox(_dot_)com "You uudecode it."
http://www.pobox.com/~gsutter/ "I I I decode it?"