procmail
[Top] [All Lists]

Re: Procmail Spam Filtering

2001-12-31 12:19:06
At 18:13 2001-12-31 +0800, Jason Jordan wrote:

I'm hoping it's ok for me to post portions of my procmailrc here for
comment by the Wizards.  I know there are some really awful mistakes
in here - and I suspect they're currently bad enough to prevent this
procmailrc from working properly!

Uhm, you could start with an explanation of _what_ isn't working as you'd expect, and what you determined from the VERBOSE log of said test.

--- BEGIN
VERBOSE=no

Set to yes and pump a few failed messages through it. If you do post an excerpt here, DO NOT post the sum total of the log for a bunch of messages - make some attempt to isolate the log portions to the recipe which is failing.

MAILDIR=/var/spool/mail
LOGABSTRACT=
LOGFILE=/var/log/procmail.log

I tend to set the logfile BEFORE defining verbosity, logabstract, etc.

SENDMAIL = "sendmail -oi -t"
FORMAIL = "/usr/bin/formail"

Quotes are wildly unnecessary.

LOG="--- Logging ${LOGFILE} for {$LOGNAME},"
XLOOP = "X-Loop: $LOGNAME(_at_)$HOST"

Again, lose the quotes (except on the Log line, where they're useful)

TXT_NO_HTML = /etc/procmail/reject-message.txt
REJECT = /etc/procmail/return.txt

Are these rules running in the /etc/procmailrc or are they running in your personal .procmailrc ? This would be useful to know. From the above LOG= line, I assume this is in the global procmailrc.

NL = "
            "

Should newline have so many trailing spaces?  To what purpose?

LOCKFILE

? This CLEARS an existing explicit lockfile. You haven't SET one yet. Unless you're clued into something that I'm not, lose this line.

## Make a backup & keep the last 32 emails
:0 c
backup

:0 ic
| cd backup && rm -f dummy `ls -t msg.* | sed -e 1,32d`
####

If you're using maildir (i.e. backup is a DIRECTORY, not a file), you should endeavour to note that above where you write to it.

[snip - big5 filtering]

:0BD
* -1^1 .
*  2^1 =[0-9A-F][0-9A-F]
* 20^1 [(128)-(255)]

(where (128) and (255) are not parenthetical and ARE the hardcoded character)

## spam reporting
## these are my bait addresses, and addresses that are contaminated
## and already receive too much spam

Suggestion: drop down to ONE bait address, and refuse the rest at the MTA.

Also, note that if you ever host other domains, you're asking for trouble by not including the domain of these addresses (but then, your HTML refusal rule below will cause problems for other users as well).

## reject HTML
## we don't want no steekin' HTML so reject it but explain why
:0fh

WHY is THIS ruleset a _filter_ ? You're DELIVERING the message to sendmail within this one (and that isn't performed with a COPY either). h flag is rather meaningless here without the filter context as well.

       # Make a temporary file of the message to be returned
        :0 wc:/tmp/lock

That's an awfully vague name for a lockfile, esp. in a public dir (which is to be discouraged anyway). This is all unnecessary anyway - your rule as written needs an intermediate file - but it is a waste to write it that way.

Worse yet, if a message arrives just as this rule is finishing, and before it has started executing the NEXT rule, the file has the possibility of being overwritten because the lock was cleared. If you're going to use a generic lock, lock it on the outer nesting level that encompases ALL use of the intermediate file which you are locking for.

       # Discard whitespaces, insert a leading blank
        | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > $REJECT

That'd be discard _trailing_ whitespaces. Is there a particular reason you expand the tabs (besides wanting to make the prefixed space appear to do something when a line _starts_ with a tab?)

        | sed -e 's/^\(.*[^     ]\)[    ]*$/ \1/' > $REJECT

(brackets contain a space+tab). For each line, this would insert a leading space, and strip the trailing whitespace, if any exists. That should be ONE shell process, versus your three (not including the parent shell which procmail must invoke). The following would accomplish a more traditional quoting of the message, still keeping it to one shell process (AND not bloating the message in case it does contain lots of tabs):

        | sed -e 's/^/\>/' -e 's/[      ]*$//' > $REJECT

But, we'll revisit this just below, since the intermediate file isn't needed if you play your cards properly.

       :0:/tmp/lock
        | ($FORMAIL -r -I "Subject: Rejected Mail: HTML e-mail refusal"\
                -I "From: postmaster(_at_)pcguru(_dot_)com(_dot_)au" \
                -A "X-Mailer: Procmail Autoreply"   \
                -A "$XLOOP" ;                       \
                cat $TXT_NO_HTML ;                  \
                echo "--- begin rejected mail ---" ;\
                cat $REJECT ;                       \
                echo "--- end rejected mail ---" ;  \
                rm -f $REJECT                       \
        ) | $SENDMAIL
}

Bah, forget the intermediate file, and use filter (on the delivery portion).

Create two files: rej.begin and rej.end - put your separators in those files.

REJ_BEGIN=/etc/procmailrc/rej.begin
REJ_END=/etc/procmailrc/rej.end

(you could just put the rej.begin text at the tail end of the refusal text).

I extract multiple values in a central location in my .procmailrc, so they're available to all my filters later on without having to multiply invoke formail or whatever to extract them. Among these values is:

        :0h
        SENDER=|$FORMAIL -b -rtzxTo:

Put this above the following rewrite of your bounce rule.

# note that some HTML-ish crap arrives as type multipart/alternative, so you
# may eventually want to condition this for that as well - though you should
# keep in mind that many loosers have little control over their email
# application (AOL'ers, MSN'ers, and many freemail service users chiefly
# among these loosers).

:0
* ! ^FROM_DAEMON
*$ ! ^$XLOOP
* ^Content-Type: text/html
{
        LOG = "$NL --TRASH: HTML $NL"

# filter the body to strip trailing whitespace and add a quoting level.
        # Also add our explanation text and the rejection separators.
# note that this is being done to the WHOLE message - if you want to do
        # it to the BODY only, that's quite alright - just add the 'b' flag.
        # note that using the separator files saves us several shells.
        :0f
        | sed -e 's/^/\>/' -e 's/[      ]*$//' | \
                cat $TXT_NO_HTML $REJ_BEGIN - $REJ_END

        # This rule is written with the assumption that you've extracted the
        # appropriate reply-to address previously, and therefore DOES NOT
# rely on the message headers being intact (since we very well may have
        # completely mangled them just above).
# note no lockfile necessary since we're not diddling with intermediate
        # files, and we're also not having to remove the messagefile either.
        :0
        | ( $FORMAIL -f -I "Subject: Rejected Mail: HTML e-mail refusal" \
                -I "To: $SENDER" \
                -I "From: postmaster(_at_)pcguru(_dot_)com(_dot_)au" \
                -A "X-Mailer: PCGuru Autoreply"   \
                -A "$XLOOP" ) | $SENDMAIL 
-fpostmaster(_at_)pcguru(_dot_)com(_dot_)au
}


Note that I changed your "Procmail Autoreply" to something more you-specific. if someone deems your bounces to be spam, they may home in on the X-Mailer and suspect that "procmail" is a spam tool, and that wouldn't be good for the rest of us. I also tell formail to not generate a "From " header, and I tell sendmail to make this message envelope-from the postmaster (though you might rather use mailer-daemon).


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>