On Sat, Apr 16, 2005 at 09:48:34AM -0700, Bart Schaefer wrote:
:0
* ^Subject:\/.*
{
:0D
* -1^1 MATCH ?? [a-zA-Z]
* -3^1 MATCH ?? [a-z]
* 4^1 MATCH ?? [A-Z]
{ SEVENTYFIVEPCTCAPS=yes }
}
Good post. I have a couple of comments, nonetheless.
First, here is a sample message I just created a Subject
for. I will show the Subject using a shell alias I
have that I call "headparse". Here is the alias,
btw -- some may find it useful:
formail < \!:$ -zfx \!:1 -s | sed "s/^<//; s/>//"
(I slice off the brackets because I often use the alias
on Message-IDs, and the brackets mess up piped actions.)
Okay, anyway:
1:15am [~/Mail] 215[0]> headparse Subject $SPAMPLE
NOW IS THE TIME FOR ALL GOOD men to come
1:15am [~/Mail] 216[0]> headparse Subject $SPAMPLE | wc -c
41
1:15am [~/Mail] 217[0]> headparse Subject $SPAMPLE | tr -d -c '[:lower:]' | wc
-c
9
1:15am [~/Mail] 218[0]> headparse Subject $SPAMPLE | tr -d -c '[:upper:]' | wc
-c
22
So the first question is, what is 75%? Seventy-five percent of
the alphabetical chars? Of all chars? Here is a line with 40
chars -- remember that wc pads by 1 when the newline is there --
of which 22 are upper-case and 9 are lower-case. (Nine are spaces.
Here there are no non-alphabetical chars, but if there were, they
would obviously also not count as upper or lower.)
Twenty-two upper-case chars out of 40 total chars is not 75%.
It's 55%. Forthermore, 22 upper-case chars of 31 alphabetical chars
is still not 75%: it's just under 71%. Still, this messages
"passes" Bart's recipe:
1:30am [~/Mail] 230[0]> harness $SPAMPLE | tail -15
procmail: Assigning "MATCH="
procmail: Matched " NOW IS THE TIME FOR ALL GOOD men to come"
procmail: Match on "^Subject:\/.*"
procmail: Score: -31 -31 "[a-zA-Z]"
procmail: Score: -27 -58 "[a-z]"
procmail: Score: 88 30 "[A-Z]"
procmail: Assigning "SEVENTYFIVEPCTCAPS=yes"
procmail: Assigning "HOST"
procmail: HOST mismatched "panix5.panix.com"
From tachenym(_at_)westonka(_dot_)k12(_dot_)mn(_dot_)us Sun Apr 17 01:00:24
2005
Subject: NOW IS THE TIME FOR ALL GOOD men to come
Folder: 1528
So something is not kosher about Bart's recipe. ("Harness" is
my test harness, or sandbox, for procmail.)
Further, I wish to say that we shouldn't need to count things
in the Subject three times to come up with a useful test for
75%. Twice is enough. We only need -- if we're measuring
only against alphabetical chars, and not all chars -- to
have at least three out of four be upper-case. I'd do
it this way:
:0
* ^Subject:.*\/[^ ].*
{
:0 D
* 1^1 MATCH ?? [A-Z]
* -3^1 MATCH ?? [a-z]
{ MORETHANSEVENTYFIVEPCTCAPS = yes }
}
A few other points: By insisting on finding a non-space or non-tab
char in the Subject before we bother, we save something, as some
messages come with empty or missing Subjects. (Not so many, granted,
but some.) (And that's a space and a tab in the brackets with the
caret in the outer recipe.) Second, exactly 75% will not fall
through to the assignment action here. Either call it more correctly
"MORETHAN" 75%, or put a tiny scoring pad in the condition set:
* 0.1^0
* 1^1 MATCH ?? [A-Z]
* -3^1 MATCH ?? [a-z]
Okay, gotta hit the hay now. . . :-)
--
dman
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail