procmail
[Top] [All Lists]

Re: regexp - question 1

2000-09-27 14:25:04
At 11:06 2000-09-27 -0500, tomcat(_at_)visi(_dot_)com wrote:

First, could you do all of us a BIG favour and choose *ONE* subject to post your many questions on this one topic? Posting repetative questions under different topics is not going to assist in your cause.

Second, please READ previous replies and consider applying them before asking the same or vastly similar questions.

* ^From:+\/.*
        { REALSENDER = $MATCH }

Wrong syntax for + -- it is one or more of the previous expression - which here, is a colon. So, your syntax is looking for:

        From:blah blah
        From::blah blah
        From:::blah blah

I doubt this is what you want. The pretty much accepted standard of extracting this info is:

* ^From:[       ]*\/[^  ].*

Both sets of brackets contain a space and a tab - when you see brackets like this in other messages, esp right after the header name, and where it appears to contain more than one whitespace, you can generally assume SPACE+TAB

which is " bozo(_at_)bozo(_dot_)com" If correct, do I want the
leading space there? Aren't I getting a space between

No, since other processing you might end up doing might care about the
space. It is easy enough to add it where you need it than to try to remove it. Additionally, if there is NO leading space, when you create new headers, if you ASSUME there was a space and don't include one yourself, there won't be one.

formail -cz -I "From: $REALSENDER"

Here, you presumably would NOT want a space in $REALSENDER, because you're obviously adding one. Not that it would affect the validity of the message.

I think + here instead of * before \/ because
if there is no line starting out "From:" the

If there is no line starting with From:, then the whole expression won't be matched. If you're thinking the + or * modifier applies to the whole text "From:", you're mistaken. If you _really_ wanted that, you need parenthesis:
        ^(From:)+

(but this is patently wrong, since you're not looking for things like "From:From:")

email will fail anyway (for my purposes)
In that case, should I check for "Reply-To" ??

Messages without From: should generally be considered bad. More likely than not, they're spam, and if not, then they're being generated by a braindead application. HOWEVER, if you're using this extracted address as where you're going to send a reply to, you really should check for Reply-To. There's an easier way to do this:

# for instances where you want what address is was sent from
:0
* ^From:[       ]*\/[^  ].*
{
        FROM=$MATCH
}

# and what address a reply would properly be addressed to
:0 h
SENDER=|$FORMAIL -b -rtzxTo:


Assuming you've defined $FORMAIL to point to your formail executable - or you could have formail in your path, and replace $FORMAIL with formail.

See the formail man page 'man formail' before asking questions on the above options to formail. In fact, now might be a good time to take a break and check the various procmail FAQs.


Once you have these (I pre-emptively fetch subject and TO as well), you have stuff you can use in your filter(s) at will.

As defined here, $SENDER is the proper address to mail the sender of the message - their From: or reply-to, etc, as defined by the RFC-822 ruleset.


Would

* ^From:.+\/.*
        { REALSENDER = $MATCH }

match the whole "From: bozo(_at_)bozo(_dot_)com"
and so "" would be put into REALSENDER ??

Perhaps it is time for you to experiment with manually-invoked procmail scripts. You should really have started there anyway - experimenting with filters on your live mailspool would be a fool thing to do, and if you were using a test filter, the answers to your questions would be painfully obvious.

Say you have a message file, or a mailbox (in either case, a file into which you have stored one or more messages, complete with headers):

formail -s procmail -m testing.rc < your_message_file

This will send your message file into formail, which will SPLIT it up into its individual messages, handing each one in turn to procmail, which will run them against the testing.rc ruleset. If it were just a SINGLE message, you could skip the 'formail -s' at the beginning, but it's just as well that you do it this way, because it simplifies things for when the message file does contain more than one message.

This has *NOTHING* to do with the mail coming in your inbox, so as long as testing.rc isn't referred to by your .procmailrc (or any INLCUDERC's in it), and as long as you're not dumping output into a directory overwriting your _actual_ mailboxes, you can hack it to your hearts content, and not mess up your regular mail filtering.

Make a testing subdir, and put this stuff in there.

Now, in testing.rc, set up a nice basic .procmailrc type framework:

# -- start testing.rc example
# called from untwit script.
#
# This will take whatever messages in the twits file and re-send them into
# the mailstream for the current user to be processed again, presumably
# under modified rules.

COMSAT=no
# logging, good stuff...
LOGFILE=./testing.log
# LOTS of logging, better stuff.
VERBOSE=on

# Define paths to individual apps we use.  At the shell, you can use
# 'which app' or 'type app' to locate the path to the app.
FORMAIL=/usr/bin/formail
FGREP=/usr/bin/fgrep

# default mail delivery mailbox - for my testing purposes, anything NOT
# specifically filtered, goes to the ether (rememmber, we're piping into
# this ruleset from a saved file).  For your purposes, you might want to
# set this to ./default.mbox or something.
DEFAULT=/dev/null

# get the sender info
:0h
SENDER=|$FORMAIL -b -rtzxTo:

# may include any other common setup rules, as you'd have them in your
# .procmailrc

# include your test filter.
INCLUDERC=test_filter.rc

# -- end testing.rc example

I use something vaugely similar to post-process my spam file to extract individual messages from people, add a spam-filtering-bypass header of sorts, then re-inject them into my regular procmail rules, so they get stored into the appropriate mailbox and tossed into my mail spool, for retrieval by my client software (this is for those infrequent occasions when a message gets mis-identified as spam, and is part of the reason I don't simply /dev/null my spam). I can also extract various individual messages from mailboxes as well. But I'm getting OT here..


Now, put the filter rules you want to test into test_filter.rc (or rename and change the above as appropriate).

An example test_filter.rc - the rule you inquired about above:

# begin test_filter.rc
:0
* ^From:.+\/.*
{
        REALSENDER=$MATCH
}
# end test_filter.rc

Now, run the filter, and examine the testing.log file.

Experiment. You'll answer a LOT of your own questions this way. When you want, you can edit the test message to be precisely what you want it to be, and feed that into the test script. Between runs, you'll probably want to delete the testing.log file.

You might even make a shell script to run the procmail process, then show the log, and delete the log:

#!/bin/sh
# delete the log from previous run
rm testing.log
# run the test filter
formail -s procmail -m testing.rc < my_message_file
# view the log
less testing.log
# edit the test filter
vi test_filter.rc

Set the script file to have +x attrib (so you can run it).

You'd run the script, the previously existing log would be deleted, the filters would be processed, the log would be viewed, you could see how the output worked, and then exit the pager (less), the editor would be invoked on the test script so you could make tweaks, and run again.

[snip - a LOT of these "would this match THIS" questions that would be answered with simple tests]

"Subject: Re: MailWeb: Test A xxxxxxxxxxxxxxxxxx 99bozo(_at_)bozo(_dot_)com"

* ^Subject:.[99]*\/.*
 { ORIGINATOR = $MATCH }

would this match
"Subject: Re: MailWeb: Test A xxxxxxxxxxxxxxxxxx 99"

NO. I get the feeling that you did NOT read my previous post about [99] defining a character class, rather than a literal. You'll find it under one of the OTHER subject lines you've used for this discussion.

Simple testing of this via the above described method would confirm this.

 Is there a limit to how long the subject line can be and
 not be chopped off??

LINEBUF chars.

You're unlikely to run into this limit on a header. Bodies are a different matter. OTOH, different _mail clients_ are all too likely to cut long subjects down in size - of any one header to NOT let get too big, this would be the one.

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>