procmail
[Top] [All Lists]

Problem with LOGging, and assorted procmail questions

1997-06-10 10:38:00
Hello,

This is my first post to the list. I started using procmail a week ago, and have
managed to get right most of what I wanted to do. (Great fun to see the first
real-world junk mail automatically trashed :) I have the manual pages covered,
have read several of the tutorials available on the Web, and I have modelled my
.procmailrc on the exmaples I've found. I'm fairly familiar with regular
expressions and unix shells, but having been a ms-dos/windows person for most of
my computer years I do have gaps in my knowledge... Hence, a few problems I
encountered trying to set up the filters which I haven't been able to resolve by
myself. I'll appreciate any help/hints, through the list or via private email,
whichever you prefer.

(Some brief system information: linux shell account; procmail version 3.10)

1. Initially I was using very simple recipes with full (verbose) logging. The
mail identified as junk gets sent to a trash folder, declared as 
    TRASH=$HOME/trash
As I kept adding recipes, the verbose log file became pretty hard to follow; I
then tried setting verbose to off and instead inserting short "id" lines in the
log, as follows:

# (case a)
:0
* ^From:(_dot_)*(_at_)domainx(_dot_)com
LOG="Trash: junk from domainx "
$TRASH

This, however, appeared to confuse procmail. The recipes no longer matched (they
used to originally, before I added the "LOG=" lines) and all mail ended up in
$DEFAULT. Next I tried:

# (case b)
:0
* ^From:(_dot_)*(_at_)domainx(_dot_)com
{
    LOG="Trash: junk from domainx "
    $TRASH
}


...with precisely the same results. The logfile contained messages from procmail
about skipping the filing commands and lockfile problems, such as:

    procmail: Skipped "$TRASH"
    procmail: Extraneous locallockfile ignored

My last try was

# (case c)
:0
* ^From:(_dot_)*(_at_)domainx(_dot_)com
{
    :0
    LOG="Trash: junk from domainx "
    $TRASH
}

...with results similar to the above or worse, i.e. this time procmail started
to actually drop some messages (did not file them in any of the predefined
folders and not in $DEFAULT either). The messages that did "survive" were not
filed according to the recipes.


Which is where my wit ends... I am probably making some obvious/silly mistake
here, but I really can't spot it... How do I group commands that follow a match
rule if not with curly brackets?


2. Short question about the lock files - in the example .rc files I've seen,
some recipes begin with
    :0
and others with
    :0:
without any pattern I can spot. When is one supposed to use which?


3. *Strange* problem with regular expression, trying to match Subject line in
all caps. What I thought should work was:
    * ^Subject:[^a-z]*$
Which I understand as meaning 'a line that begins with "Subject:" followed by
zero or more characters none of which are lowercase a-z letters, followed by
end-of-line.' I tested the regular expression with a text editor that supprts
them, and it did match fine; yet the procmail log file contains:

    procmail: No match on "^Subject:[^a-z]*$"
    From eristic(_at_)lodz(_dot_)pdi(_dot_)net  Tue Jun 10 16:14:20 1997
    Subject: ALLCAPS
    Folder: /var/spool/mail/eristic                       937

(the message went to $DEFAULT instead of $TRASH where it should have.)


I tried something more subtle (?) with the same negative effect:
    
    * ^Subject:[^a-z]*$
    * !^Subject:.$
    
(to ensure that messages with an EMPTY subject line will not be trashed; I have
some Real Newbie friends who tend to forget about subject lines... :)

Another try which also didn't work was:

    * ^Subject:[^a-z]+$
    
(to match only if there is at least 1 character following the colon)

All the above three recipes produce NO MATCH for all of the following test
subject lines:
    Subject: ALLCAPS
    Subject: ALL CAPS
    Subject: SUBJECT IN ALL CAPS, IGNORE!

Again, I suspect something must be wrong with my regular expression...

Aside, about empty subject lines: I assumed that when the sender does not
include any text in the subject line, the Subject: field would still exist and
match
    ^Subject:$
(colon followed by CRLF) However, when I inspected a test no-subject message I
sent from my Forte Agent mailer, the Subject field was not there at all! Never
mind whether this conforms to RFC822, it'd be interesting to know how other
mailers behave - do they preserve the empty Subject: field, or do they drop it
altogether?


4. Last question (promise!) I have tried to scan the email body for trigger
words, and it worked just fine. What I'm wondering about is whether also
grepping the body requires much more processning time (i.e. should I be doing
this to my provider's machine?) - especially if a message contains, say, a 500
kb uuencoded attachment (I often get such from my employer)?

Alternatively, would it be possible to limit the body scan to n lines off the
top? (I'd like to look for those phony "Dear friend," headings that spammers so
often use, I *hate* those! :) Some combinations of fgrep with head comes to
mind, but how to weave these two together is black magic for me, at this
point...


There'll probably be more questions as I try more things; for now everything
else I've tried seems to be working quite well. I'd very much appreciate hints
on solutions to the above 4 problems...


Thanks for your patience, and I hope I haven't said anything excessively
stupid...

marek jedlinski

--
"If you're happy and you know it, clunk your chains."
email: eristic(_at_)gryzmak(_dot_)lodz(_dot_)pdi(_dot_)net (finger for public 
PGP key)
Largactil Café http://www.lodz.pdi.net/~eristic/index.html
Hail Eris. Hail Bopp. All Hail Discordia. Percolate.