procmail
[Top] [All Lists]

Re: Scoring for Capitals in the Subject line

2005-04-17 05:15:45
On Sun, Apr 17, 2005 at 01:42:10AM +0200, Dallman Ross wrote:

I'd do it this way:

:0
* ^Subject:.*\/[^       ].*
{
 :0 D
 *  1^1 MATCH ?? [A-Z]
 * -3^1 MATCH ?? [a-z]
 { MORETHANSEVENTYFIVEPCTCAPS = yes }
} 


A few other points: By insisting on finding a non-space or
non-tab char in the Subject before we bother, we save something,
as some messages come with empty or missing Subjects.  (Not so
many, granted, but some.)  (And that's a space and a tab in the
brackets with the caret in the outer recipe.)  


We could streamline it even further by insisting on at least
one alphabetical character:

  * ^Subject:.*\/[^       ].*[a-z]

Actually, that condition might be requiring two alpha chars.
E.g., suppose the Subject is: "A1".  The above condition
won't match on it.  But it will match on "1A".

We could just do it this way instead:

  * ^Subject:.*\/[a-z].*

(This entire sub-discussion assume we're only interested in
finding a percentage of all-caps from among all alphabetical
chars, not all chars in the Subject.)

However, even though that might thought of as slightly more
efficient (because it throws out messages with Subjects
that contain no letters at all), I find it a detraction not
to have the full Subject-line saved to MATCH; we might have
good use for the unadulerated Subject later in the procmail
run, so saving $SUBJECT here while we have it is useful.
If we start the match at the first letter instead of the first
char, we lose that usefulness.

Conclusion: Best, I think, is 

  * ^Subject:.*\/[^       ].*[a-z]

here.  Besides, if the Subject comprises only one letter, and
that one letter is a cap, do we really want to have our test
pass?  One hundred percent of the letters in the Subject "X"
are caps; yet that's not a very good spam indicator.  So that's
yet another reason to go with the condition just above and
require at least two non-whitespace chars, at least one past
the first one of which must be a letter.

-- 
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail