procmail
[Top] [All Lists]

Re: Filtering capitalized subject

2002-01-10 00:21:17
| How about this?
|
| :0:
| * Subject:.[A-Z]+
| * ! Subject:.[a-z]*
| spam

Not even close, Stig.  Sorry.  It would never file anything in the spam
folder.

First, without the `D' flag, procmail will ignore case in regexps.  Second,
"." matches any single character except a newline (you seemed to confuse it
with ".*").

Your first condition means that the second character after the colon of
"Subject:" must be a letter of the alphabet; your own post, where the "R" of
"Re:" was in that position (the first character after the colon was the
usual space), would qualify.  So just about any message with a subject would
match.

Your second condition says that there must not be a character after the
colon of "Subject:".  Because an asterisk means "zero or more," the trailing
"[a-z]*" can match null and doesn't help at all.  The only messages that
would (fail the regexp and thus) match that condition are those with no
subject header at all and those with just "Subject:" with nothing after the
colon.  Since anything that passed the first condition would (match the
second regexp and thus) fail the second condition, nothing would ever get to
the spam folder by dint of that recipe.

I'm guessing that you were trying to say that there should be at least one
capital letter and no lower-case letters in the Subject: header, right?  So
it's like this:

 # if no subject header at all or an empty one, it's likely spam; otherwise,
 # extract,
 # but don't include Re: if it's there, because a lower-case "e" in "Re:" is
 # no excusal
 :0:
 * ! ^Subject: *Re:\/.+
 * ! ^Subject:\/.+
 spam

 # extracted text has at least one capital letter and no lower-case letters
 :0ED:
 * MATCH ?? [A-Z]
 * ! MATCH ?? [a-z]
 spam

The reasons I extracted the text after "Subject: Re:" or after "Subject:"
and tested only on that, rather than testing on the whole line, are these:

(1) The cases of the letters in "Subject:" or "Subject: Re:" should not
figure into the decision.
(2) If we try to deal with "Subject:" in optional case and "Re:" in optional
case and optional existence in the same condition lines as the text after,
it gets really nasty.  For example, this doesn't work if "Re:" is present
with a capital R and a lower-case e, followed by a bunch of capitals,
because the second condition's regexp will just skip the part with the
question mark, match the lower-case e to [a-z], and match the regexp; since
the condition is negated, that will make the condition fail and the message
get taken for non-spam despite all the capitals in the rest of the subject
line.

 :0D: # This is fooled by "Subject: Re: ALL CAPS HERE"
 * ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]: *([Rr][Ee]: *)?.*[A-Z]
 * ! ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]: *([Rr][Ee]: *)?.*[a-z]
 spam

We'd actually need these two messy recipes if we didn't extract:

 :0D: # subjects with Re: and then all caps
 * ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]: *[Rr][Ee]:.*[A-Z]
 * ! ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]: *[Rr][Ee]:.*[a-z]
 spam

 :0ED: # other subjects with all caps
 * ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]:.*[A-Z]
 * ! ^[Ss][Uu][Bb][Jj][Ee][Cc][Tt]:.*[a-z]
 spam




_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>