Re: regexp - question 1

At 11:06 2000-09-27 -0500, tomcat(_at_)visi(_dot_)com wrote:

First, could you do all of us a BIG favour and choose *ONE* subject to postyour many questions on this one topic? Posting repetative questions underdifferent topics is not going to assist in your cause.

Second, please READ previous replies and consider applying them beforeasking the same or vastly similar questions.

* ^From:+\/.*
        { REALSENDER = $MATCH }

Wrong syntax for + -- it is one or more of the previous expression - whichhere, is a colon. So, your syntax is looking for:


        From:blah blah
        From::blah blah
        From:::blah blah

I doubt this is what you want. The pretty much accepted standard ofextracting this info is:


* ^From:[       ]*\/[^  ].*

Both sets of brackets contain a space and a tab - when you see bracketslike this in other messages, esp right after the header name, and where itappears to contain more than one whitespace, you can generally assume SPACE+TAB

which is " bozo(_at_)bozo(_dot_)com" If correct, do I want the
leading space there? Aren't I getting a space between


No, since other processing you might end up doing might care about the

space. It is easy enough to add it where you need it than to try to removeit. Additionally, if there is NO leading space, when you create newheaders, if you ASSUME there was a space and don't include one yourself,there won't be one.

formail -cz -I "From: $REALSENDER"

Here, you presumably would NOT want a space in $REALSENDER, because you'reobviously adding one. Not that it would affect the validity of the message.

I think + here instead of * before \/ because
if there is no line starting out "From:" the

If there is no line starting with From:, then the whole expression won't bematched. If you're thinking the + or * modifier applies to the whole text"From:", you're mistaken. If you _really_ wanted that, you need parenthesis:

        ^(From:)+

(but this is patently wrong, since you're not looking for things like"From:From:")

email will fail anyway (for my purposes)
In that case, should I check for "Reply-To" ??

Messages without From: should generally be considered bad. More likelythan not, they're spam, and if not, then they're being generated by abraindead application. HOWEVER, if you're using this extracted address aswhere you're going to send a reply to, you really should check forReply-To. There's an easier way to do this:


# for instances where you want what address is was sent from
:0
* ^From:[       ]*\/[^  ].*
{
        FROM=$MATCH
}

# and what address a reply would properly be addressed to
:0 h
SENDER=|$FORMAIL -b -rtzxTo:

Assuming you've defined $FORMAIL to point to your formail executable - oryou could have formail in your path, and replace $FORMAIL with formail.

See the formail man page 'man formail' before asking questions on the aboveoptions to formail. In fact, now might be a good time to take a break andcheck the various procmail FAQs.

Once you have these (I pre-emptively fetch subject and TO as well), youhave stuff you can use in your filter(s) at will.

As defined here, $SENDER is the proper address to mail the sender of themessage - their From: or reply-to, etc, as defined by the RFC-822 ruleset.

Would

* ^From:.+\/.*
        { REALSENDER = $MATCH }

match the whole "From: bozo(_at_)bozo(_dot_)com"
and so "" would be put into REALSENDER ??

Perhaps it is time for you to experiment with manually-invoked procmailscripts. You should really have started there anyway - experimenting withfilters on your live mailspool would be a fool thing to do, and if you wereusing a test filter, the answers to your questions would be painfully obvious.

Say you have a message file, or a mailbox (in either case, a file intowhich you have stored one or more messages, complete with headers):


formail -s procmail -m testing.rc < your_message_file

This will send your message file into formail, which will SPLIT it up intoits individual messages, handing each one in turn to procmail, which willrun them against the testing.rc ruleset. If it were just a SINGLE message,you could skip the 'formail -s' at the beginning, but it's just as wellthat you do it this way, because it simplifies things for when the messagefile does contain more than one message.

This has *NOTHING* to do with the mail coming in your inbox, so as long astesting.rc isn't referred to by your .procmailrc (or any INLCUDERC's init), and as long as you're not dumping output into a directory overwritingyour _actual_ mailboxes, you can hack it to your hearts content, and notmess up your regular mail filtering.


Make a testing subdir, and put this stuff in there.

Now, in testing.rc, set up a nice basic .procmailrc type framework:

# -- start testing.rc example
# called from untwit script.
#
# This will take whatever messages in the twits file and re-send them into
# the mailstream for the current user to be processed again, presumably
# under modified rules.

COMSAT=no
# logging, good stuff...
LOGFILE=./testing.log
# LOTS of logging, better stuff.
VERBOSE=on

# Define paths to individual apps we use.  At the shell, you can use
# 'which app' or 'type app' to locate the path to the app.
FORMAIL=/usr/bin/formail
FGREP=/usr/bin/fgrep

# default mail delivery mailbox - for my testing purposes, anything NOT
# specifically filtered, goes to the ether (rememmber, we're piping into
# this ruleset from a saved file).  For your purposes, you might want to
# set this to ./default.mbox or something.
DEFAULT=/dev/null

# get the sender info
:0h
SENDER=|$FORMAIL -b -rtzxTo:

# may include any other common setup rules, as you'd have them in your
# .procmailrc

# include your test filter.
INCLUDERC=test_filter.rc

# -- end testing.rc example

I use something vaugely similar to post-process my spam file to extractindividual messages from people, add a spam-filtering-bypass header ofsorts, then re-inject them into my regular procmail rules, so they getstored into the appropriate mailbox and tossed into my mail spool, forretrieval by my client software (this is for those infrequent occasionswhen a message gets mis-identified as spam, and is part of the reason Idon't simply /dev/null my spam). I can also extract various individualmessages from mailboxes as well. But I'm getting OT here..

Now, put the filter rules you want to test into test_filter.rc (or renameand change the above as appropriate).


An example test_filter.rc - the rule you inquired about above:

# begin test_filter.rc
:0
* ^From:.+\/.*
{
        REALSENDER=$MATCH
}
# end test_filter.rc

Now, run the filter, and examine the testing.log file.

Experiment. You'll answer a LOT of your own questions this way. When youwant, you can edit the test message to be precisely what you want it to be,and feed that into the test script. Between runs, you'll probably want todelete the testing.log file.

You might even make a shell script to run the procmail process, then showthe log, and delete the log:


#!/bin/sh
# delete the log from previous run
rm testing.log
# run the test filter
formail -s procmail -m testing.rc < my_message_file
# view the log
less testing.log
# edit the test filter
vi test_filter.rc

Set the script file to have +x attrib (so you can run it).

You'd run the script, the previously existing log would be deleted, thefilters would be processed, the log would be viewed, you could see how theoutput worked, and then exit the pager (less), the editor would be invokedon the test script so you could make tweaks, and run again.

[snip - a LOT of these "would this match THIS" questions that would beanswered with simple tests]

"Subject: Re: MailWeb: Test A xxxxxxxxxxxxxxxxxx 99bozo(_at_)bozo(_dot_)com"

* ^Subject:.[99]*\/.*
 { ORIGINATOR = $MATCH }

would this match
"Subject: Re: MailWeb: Test A xxxxxxxxxxxxxxxxxx 99"

NO. I get the feeling that you did NOT read my previous post about [99]defining a character class, rather than a literal. You'll find it underone of the OTHER subject lines you've used for this discussion.


Simple testing of this via the above described method would confirm this.

 Is there a limit to how long the subject line can be and
 not be chopped off??


LINEBUF chars.

You're unlikely to run into this limit on a header. Bodies are a differentmatter. OTOH, different _mail clients_ are all too likely to cut longsubjects down in size - of any one header to NOT let get too big, thiswould be the one.


---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail