regexp thoughts for future procmail version

I use plussed addresses for some things. Recently, I added a BCC detectorto my spam ruleset. From recent discussions about "spammishness", somemembers should already be aware that I simply allow some characteristics tocontribute some amount towards the likelyhood that a message is spam.

Basically, in my mainline ~/.procmailrc, I extract various commonlymanipulated or compared headers, Subject/To/From/Sender/Envelope To andFrom. This beats the heck out of extracting them multiple times throughoutthe whole procmailrc each time they may be needed.


So, let's say that the following is in .procmailrc:

        :0
        * ^X-Envelope-To: *<\/[^>]*
        {
                ENVTO=$MATCH
        }

(in case you're wondering, there is also a spammishness test for an absenseof this header, which indicates multiple local recipients, but that worksin conjunction with another test, not the point of this post)

Elsewhere in the mainline, there's an includerc of a file which extracts asimple listname component. Suffice it to say, LISTNAME is either NULL, orcontains a string identifying the root name of a discussion list.


Now, in my spam file (or, at the moment, the sandbox):

# No cleartexted recipients matching the X-Envelope-To
# (AND not a list, where that would be very normal).
# NOTE: Basically, this means we were BCC'd, which itself is perfectly
# valid, but also very commonly used in spam.
:0
* LISTNAME ?? ^^^^
* $! ^(To|Cc):.*${ENVTO}
{

SPAMNOTES="${SPAMNOTES}SPAM: Advisory - no non-list cleartextrecipient matching X-Envelope-To${NL}"

        SPAMMISHNESS="${SPAMMISHNESS}+45"
}

All is good and well so long as ENVTO isn't a plussed address. And really,we're letting it slide in that the address doesn't have dots escapedeither. If it's plussed, it'll never match.


I can fudge it, in an ugly way, by:

ENVTO=`echo "$ENVTO" | sed -e "s/\+/\\\+/g" -e "s/\./\\\./g"`

Though this invocation is problematic when the shell is BASH due to how itmanages pipelining. It is also a serious waste of CPU. Unnecessary callscan be mitigated somewhat by checking the variable for the presence of somecharacters (on the LHS, it isn't expanded as a regexp):


:0
* ENVTO ?? (\.|\+)
{
        ENVTO=`echo "$ENVTO" | sed -e "s/\+/\\\+/g" -e "s/\./\\\./g"`
}

(obviously, if one were looking to escape other regexp operators, supportfor those would be added - these are the only ones I envision having todeal with in a valid address though)

My idea: in a future procmail rev, wouldn't it be useful to have a built-invariable expansion syntax which auto-escapes the variable content? A"start regexp" and "stop regexp" token are not feasable, because the stopregexp token might actually be a token within the expanded variableitself. What if:


* To:.*${{SOMEVARIABLE}}

were to escape that variable so that it would match as a literal?

If users carefully inspect their procmailrc files, you'll probably fine oneor two of those special recipes which extract a literal value and reuse itin a condition, where it'll be interpreted as a regexp.

In the meantime, does anyone have any pointers on escaping of variablescontaining regexp operators (esp bash friendly syntax or whollyinternal-to-procmail solutions not involving shells).

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail