Having written some of the code Dan Smith is using that Jari Aalto is asking
about, I'll tackle some of Jari's questions:
| | :0
| | {
| | MAX_COMMAS=45
| | #
| | # From David W. Tamkin <dattier(_at_)wwa(_dot_)com>
Actually, Philip Guenther contributed to this part as well IIRC.
| | :0h # H is implicit; this is h
| | * ^Resent-(To|Cc):
| | ADDRESSES=|formail -czxResent-To: -xResent-Cc:
| | :0Eh
| | ADDRESSES=|formail -czxTo: -xCc: -xApparently-To:
The variable ADDRESSES will now contain the lines from the header showing
addressees: if there are Resent- headers, it's Resent-To: and Resent-Cc:
that count; if there are no Resent- headers, those that count are To:,
Cc:, and Apparently-To:.
Any such headers that had no contents have been reduced to nothingness
by -z and -x; -c takes care of that were broken into continuation lines in
the middle of an address. So the number of visible addressees is the number
of lines in $ADDRESSES plus (with one bug Dan mentions below) the number
of commas in $ADDRESSES [because an address line with three commas contains
four addresses, for example].
| | # Now, the number of addressees should be the number of non-empty
| | # lines (procmail always sees an extra empty line at the end of a
^^^^^^ well, not always but usually
| | # search area) plus the number of commas; this will still overcount
| | # if someone has a comma inside a name comment (thus MAX_COMMAS
| | # instead of MAX_ADDRESSES).
| | :0
| | * 1^1 ADDRESSES ?? ^.+$
| | * 1^1 ADDRESSES ?? ,
| | * $-${MAX_COMMAS}^0
Figure the number of visible addresses and subtract the number we'll tole-
rate before suspecting spam; if the difference is positive, enter the braces.
| | {
| | SPAMCHECK_SPAM=yes
| | :0fwh
| | | formail -A "X-SpamCheck-Reason: Too many commas in addresses"
| | }
| | }
| how do I reqad this receipe? Ok; the headers must not be there,
| but why [^>]*FREE; whouldn't just "FREE" be enough?
Good question.
| | :0BD
| | * !^(In-Reply-To:|References:|Subject:[ ]*Re(\[[0-9]+\])?:).+
I think that condition needs "H ??".
| | * [^>]*FREE
Perhaps Dan meant * ^[^>]*FREE to look for the word with no citation.
| | {
| | SPAMCHECK_SPAM=yes
| | :0fwh
| | | formail -A "X-SpamCheck-Reason: Text 'FREE' detected"
| | }
We move on ...
| | :0
| | * SPAMCHECK_SPAM ?? yes
| | {
| | :0h
| | * SPAMCHECK_ACTION ?? discard
| | /dev/null
| |
| | MATCH # unset it to start
| | :0Efwh # if set to "subject" make it work if there is a subject or not
| | * SPAMCHECK_ACTION ?? subject
| | * 1^0 ^Subject\/:.*
| | * 1^0
| | | formail -I"Subject: SPAM$MATCH"
|
| Why there is E flag here? (I'm not comfortable with E flags yet...)
The `E' flag is there just in case there's a failure losing the message into
/dev/null, or if the action for the discard option changes (say, to stashing
in a trashcan) in the future, and that fails; then we don't want procmail to
bother looking at the recipe for the subject option. It's probably overcau-
tious.
| What does the empty 1^0 do?
It prejudices the score (adding 1 unconditionally) so that as long as the
unweighted condition matched, the score will be positive, whether the other
weighted condition matched or not. Yes, it's a weird way to code, and I'm
the weirdo who coded it that way. Think of it as sh**thand for this:
:0E # If the choice was not "discard", is it "subject"?
* SPAMCHECK_ACTION ?? subject
{
:0fhw # Extract any existing subject, including leading whitespace.
* ^Subject:\/.*
| formail -I"Subject: SPAM:$MATCH"
:0Efhw # If there wasn't a subject, use "SPAM" alone.
| formail -I"Subject: SPAM"
}