I use plussed addresses for some things. Recently, I added a BCC detector
to my spam ruleset. From recent discussions about "spammishness", some
members should already be aware that I simply allow some characteristics to
contribute some amount towards the likelyhood that a message is spam.
Basically, in my mainline ~/.procmailrc, I extract various commonly
manipulated or compared headers, Subject/To/From/Sender/Envelope To and
From. This beats the heck out of extracting them multiple times throughout
the whole procmailrc each time they may be needed.
So, let's say that the following is in .procmailrc:
:0
* ^X-Envelope-To: *<\/[^>]*
{
ENVTO=$MATCH
}
(in case you're wondering, there is also a spammishness test for an absense
of this header, which indicates multiple local recipients, but that works
in conjunction with another test, not the point of this post)
Elsewhere in the mainline, there's an includerc of a file which extracts a
simple listname component. Suffice it to say, LISTNAME is either NULL, or
contains a string identifying the root name of a discussion list.
Now, in my spam file (or, at the moment, the sandbox):
# No cleartexted recipients matching the X-Envelope-To
# (AND not a list, where that would be very normal).
# NOTE: Basically, this means we were BCC'd, which itself is perfectly
# valid, but also very commonly used in spam.
:0
* LISTNAME ?? ^^^^
* $! ^(To|Cc):.*${ENVTO}
{
SPAMNOTES="${SPAMNOTES}SPAM: Advisory - no non-list cleartext
recipient matching X-Envelope-To${NL}"
SPAMMISHNESS="${SPAMMISHNESS}+45"
}
All is good and well so long as ENVTO isn't a plussed address. And really,
we're letting it slide in that the address doesn't have dots escaped
either. If it's plussed, it'll never match.
I can fudge it, in an ugly way, by:
ENVTO=`echo "$ENVTO" | sed -e "s/\+/\\\+/g" -e "s/\./\\\./g"`
Though this invocation is problematic when the shell is BASH due to how it
manages pipelining. It is also a serious waste of CPU. Unnecessary calls
can be mitigated somewhat by checking the variable for the presence of some
characters (on the LHS, it isn't expanded as a regexp):
:0
* ENVTO ?? (\.|\+)
{
ENVTO=`echo "$ENVTO" | sed -e "s/\+/\\\+/g" -e "s/\./\\\./g"`
}
(obviously, if one were looking to escape other regexp operators, support
for those would be added - these are the only ones I envision having to
deal with in a valid address though)
My idea: in a future procmail rev, wouldn't it be useful to have a built-in
variable expansion syntax which auto-escapes the variable content? A
"start regexp" and "stop regexp" token are not feasable, because the stop
regexp token might actually be a token within the expanded variable
itself. What if:
* To:.*${{SOMEVARIABLE}}
were to escape that variable so that it would match as a literal?
If users carefully inspect their procmailrc files, you'll probably fine one
or two of those special recipes which extract a literal value and reuse it
in a condition, where it'll be interpreted as a regexp.
In the meantime, does anyone have any pointers on escaping of variables
containing regexp operators (esp bash friendly syntax or wholly
internal-to-procmail solutions not involving shells).
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail