procmail
[Top] [All Lists]

Re: count of words in big letters?

1997-12-17 04:28:36
On 17 Dec 1997 09:38:36 +0200,
<jari(_dot_)aalto(_at_)poboxes(_dot_)com> wrote:
    I'm unable to contruct the score recipe right. Say, that
    I tolerate 3 big letter words, and if there is more, then
    I consider it UBE. The regexp should ignore some words like:

How about, kill everything with more than two adjacent uppercase
words? 

    :0BD:
    * ()\<[A-Z]+\<+[A-Z]+\<+[A-Z]+\>
    $SPAMFOLDER

    max = 3

    #   Count capitalized words
    :0 D
    *$      -$max^0
    *$ B ?? 1^0 ()\<[A-Z][A-Z][A-Z]+[ ]
    {
        count       = $=
        dummy       = "$count capitalized words"
    }

You would obviously want to have 1^1 and not 1^0 here? Your recipe
also doesn't count uppercase words at end of line, or followed by
punctuation. Perhaps this is more to your liking:

    :0BD
    * $ -$max^0
    *       1^1 ()\<[A-Z][A-Z][A-Z]+\>
    { ... go ballistic ... }

I get fewer matches than I expect with this, but I think it has with
the chaining of \< and \> to do (i.e. the \> eats the \< from the next
substring that would have matched). You could perhaps give up on the
leading word anchor, since peoPLE are unlikELY to write lIKE this (and
if they do, they deserve to be killfiled).

Hope this helps,

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>

<Prev in Thread] Current Thread [Next in Thread>