Re: Matching questions

At 15:35 2000-09-26 -0500, tomcat(_at_)visi(_dot_)com wrote:

X-Mailto-Comment:    the text to search for (including ":" ??)


Yes, including colon, because, well, you specified a colon there.

[character list]     any single character in character list
                        (why are these all blanks - why 7??)

Unlikelt to be 7 spaces (there is no such character as 'blank' anyway) -commonly what people put here are a TAB *AND* a SPACE. If it appears to be7 spaces in someones post, your (or their) mail client probably expands thetab to the next tab stop using spaces. Bad mojo. Find offending MUA andtoss it.

There is no reason in a character class regexp that you'd have more thanone of any single character in there - it's matching a CLASS of characters.

              +     match at least one occurance

.. of the previous regexp element. Note that MOST people use * here -match zero or more. Thus "header:content" (no space/tab after the colon)will be matched, which you probably want to make sure to do.

               \/   Begin extraction; if there is a match after the
extraction
                    operator \/, put it into a variable named MATCH

Well, more precisely, put whatever is matched into the MATCH variable. Ifnothing is matched, then the MATCH variable is empty. This is distinctlydifferent than how you describe it, because if MATCH had been set (say,from a previous operation), it wouldn't retain that former value even ifthe current regexp doesn't match anything.

               .     match any character

Indeed. This, BTW is something to keep in mind when matching emailaddresses and domains, etc (I realize that isn't what you're doing here -your syntax for this part is just fine). An example:


        bozo(_at_)bozo\(_dot_)com

You should escape the dot as it appears where you really expect aDOT. Without the escape operator, the example regexp could match"bozo(_at_)bozoscom(_dot_)ru(_dot_)there" just as well. In practice, most people who forgetto escape the dot don't run into troubles, but it SHOULD be escaped whenyou want a DOT.

                       (hey - what happened to "*" as
                        "Begin a condition" ?

At the beginning of the line, that's what it is - this syntax is specificto procmail, not regexps. A few other things appearing at the beginning ofthe line (! and $ are examples, but there are others - see 'manprocmailrc') are flags to tell procmail how to handle the conditionline. After that, it's regexp. regexps vary from application toapplication - procmail, sed, awk, grep, perl, etc - all have their nuances.

so for an email header that has X-Mailto-Comment: bozo(_at_)bozo(_dot_)com
this recipe sees "X-Mailto-Comment:" and puts everything after it
into ORIG, which would be " bozo(_at_)bozo(_dot_)com", including the space?

Not including the space(s), because you specified those BEFORE the matchextraction operator.

The nuances of this all just confound me.

Drink some more caffeine, go a week without sleep, and try again. It'lleventually make sense.

out of the body of the email - it also writes the date and subject
to a log file:

Heh, unless you need the date/subject in a specific format for furtherprocessing, you should take a look at the standard procmail logging -you're producing redundant functionality.

:0
* ^Subject:.*Re: WebSite

Uh, considering the number of funky MUAs out there and all the differentways they can say "Re:", you might reconsider that syntax. I don't have aspecific recommendation, though I recall it has been hashed out on thislist many times in the past.

Judging by the things you match below, perhaps you're dealing with a webresponse form, where the form processor might be producing the subjectitself. In that case, you might disreguard the above.

     :0c
      |  formail -cz \
      | $SENDMAIL -oi $ORIGMAIL

what exactly are you expecting to send them? You don't have loop checking(see any of a number of recent posts to the procmail list about X-Loop andautoreplies). This is even more inmportant given that your whole recipetriggers on the subject, which is likely to contain a Re: in any reply theuser makes. What about bounces (nobody EVER misenters an email addressinto a webform)?

What does [^  ].* mean as opposed to .*  ??

Match anything NOT starting with a space (though you have two spaces there,perhaps one is a tab).

Remember, I am trying to avoid using the body to keep the
originators email address - I am going to try to append it


Make it an extra header.  Next problem?

Ultimatley, what is your goal here?

then i figure they are special characters and might be
hard to match - would you do \*\- to match them?

Yes, exactly. But why bother? There's no reason (as far as I can see) youcouldn't cram the email address into a totally new header using formail.

Assuming whatever condition has been matched at an outer brace level (suchas your subject), and you've extracted the address however, the followingwould cram the matched address into a new header:


:0f
| $FORMAIL -I"X-VISI-From: $SOMEADDR"

Now, the header is part of the email being passed around by procmail - ifyou resend it, or forward it, or whatever, it'll be there.

       #capture who is sending
        * ^From:.[       ]*\/.*

Okay, why is this different than your earlier example within the samepost? Are these in the SAME procmailrc?

If you plan to reply to the address, formail has a feature that extractsthe appropriate REPLY address (which may not necessarily be the FROM). See'man formail'.

       # now strip out email address from subject line
        * ^Subject:.[99]*\/.*

Once again, [] are character class matching - 99 is redundant. If you wantto match "99", say so:

                * ^Subject:.99\/.*

Using your syntax (ignoring the redundant nature of specifying the doubledcharacters in a class construct), you'd be screwed if the user address is"911(_at_)help(_dot_)com" or somesuch, because this will extract it as 11(_at_)help(_dot_)com(_dot_)

Also the dot preceeding that will match only ONE character of whatever, notjust all the preceeding whitespace, or any existing subject material whichmay have been on the line.



Instead, using the header insertion specified as above:
        :0
        * ^X-VISI-From: \/.*
        {
                ORIGINATOR=$MATCH
        }

This will extract the email address - without any concern for other crap onthe subject line (or if the address is "9911(_at_)doofus(_dot_)com", which mightotherwise run into problems.

I'm still wondering WHEN you actually crammed the address into the subjectline - none of what you've provided here yet demonstrates that.

:0c


do you really want this copied?  Why?

      |  formail -cz -I "From: $REALSENDER" \
       -I "To: $ORIGINATOR"  \
       | $SENDMAIL -oi $ORIGINATOR


Okay, you're rewriting this.  So why the previous rules in the first block?

The c option to formail is unneccessary here.

Do you think this will work, or have I committed agregious procmail
sins?

Again, what is it you're really trying to accomplish, in English? Where isthe ORIGINAL message coming from, and what is the final recipient intendedto do with it? I suspect it can be done a lot more gracefully than sendingmessages and reparsing them.


---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

Re: Matching questions - regexp