Re: Killing spam based on nameserver info

Having written some of the code Dan Smith is using that Jari Aalto is asking
about, I'll tackle some of Jari's questions:

| | :0
| | {
| |   MAX_COMMAS=45
| |   #
| |   # From David W. Tamkin <dattier(_at_)wwa(_dot_)com>

Actually, Philip Guenther contributed to this part as well IIRC.

| |   :0h # H is implicit; this is h
| |   * ^Resent-(To|Cc):
| |   ADDRESSES=|formail -czxResent-To: -xResent-Cc:
| |   :0Eh
| |   ADDRESSES=|formail -czxTo: -xCc: -xApparently-To:

The variable ADDRESSES will now contain the lines from the header showing
addressees: if there are Resent- headers, it's Resent-To: and Resent-Cc:
that count; if there are no Resent- headers, those that count are To:,
Cc:, and Apparently-To:.

Any such headers that had no contents have been reduced to nothingness
by -z and -x; -c takes care of that were broken into continuation lines in
the middle of an address.  So the number of visible addressees is the number
of lines in $ADDRESSES plus (with one bug Dan mentions below) the number
of commas in $ADDRESSES [because an address line with three commas contains
four addresses, for example].

| |   # Now, the number of addressees should be the number of non-empty
| |   # lines (procmail always sees an extra empty line at the end of a
                        ^^^^^^ well, not always but usually
| |   # search area) plus the number of commas; this will still overcount
| |   # if someone has a comma inside a name comment (thus MAX_COMMAS
| |   # instead of MAX_ADDRESSES).
| |   :0
| |   * 1^1 ADDRESSES ?? ^.+$
| |   * 1^1 ADDRESSES ?? ,
| |   * $-${MAX_COMMAS}^0

Figure the number of visible addresses and subtract the number we'll tole-
rate before suspecting spam; if the difference is positive, enter the braces.

| |   {
| |     SPAMCHECK_SPAM=yes
| |     :0fwh
| |     | formail -A "X-SpamCheck-Reason: Too many commas in addresses"
| |   }
| | }

| how do I reqad this receipe? Ok; the headers must not be there,
| but why [^>]*FREE; whouldn't just "FREE" be enough?

Good question.

| | :0BD
| | * !^(In-Reply-To:|References:|Subject:[       ]*Re(\[[0-9]+\])?:).+

I think that condition needs "H ??".

| | * [^>]*FREE

Perhaps Dan meant   * ^[^>]*FREE   to look for the word with no citation.

| | {
| |   SPAMCHECK_SPAM=yes
| |   :0fwh
| |   | formail -A "X-SpamCheck-Reason: Text 'FREE' detected"
| | }

We move on ...

| | :0
| | * SPAMCHECK_SPAM ?? yes
| | {
| |   :0h
| |   * SPAMCHECK_ACTION ?? discard
| |   /dev/null
| |
| |   MATCH # unset it to start
| |   :0Efwh # if set to "subject" make it work if there is a subject or not
| |   * SPAMCHECK_ACTION ?? subject
| |   * 1^0 ^Subject\/:.*
| |   * 1^0
| |   | formail -I"Subject: SPAM$MATCH"
| 
| Why there is E flag here? (I'm not comfortable with E flags yet...)

The `E' flag is there just in case there's a failure losing the message into
/dev/null, or if the action for the discard option changes (say, to stashing
in a trashcan) in the future, and that fails; then we don't want procmail to
bother looking at the recipe for the subject option.  It's probably overcau-
tious.

| What does the empty 1^0 do?

It prejudices the score (adding 1 unconditionally) so that as long as the
unweighted condition matched, the score will be positive, whether the other
weighted condition matched or not.  Yes, it's a weird way to code, and I'm
the weirdo who coded it that way.  Think of it as sh**thand for this:

      :0E # If the choice was not "discard", is it "subject"?
      * SPAMCHECK_ACTION ?? subject
      {
       :0fhw # Extract any existing subject, including leading whitespace.
       * ^Subject:\/.*
       | formail -I"Subject: SPAM:$MATCH"

       :0Efhw # If there wasn't a subject, use "SPAM" alone.
       | formail -I"Subject: SPAM"
      }