procmail
[Top] [All Lists]

Re: Help with .procmailrc 2

1999-09-06 04:36:55
On Mon, 6 Sep 1999 03:56:27 -0700 (PDT), Dallman Ross
<dman(_at_)netcom(_dot_)com> wrote:
(my eyes roll up into my brain and I can't think further) as soon
as I see the "\/" of a quoted slash and the one-to-ten ^ anchors
that you guys all have been interminably using for the last 3-4

There's an old posting by Philip Guenther about the \/ operator which
I think explains it pretty well. I have a HTMLized version on a web
page at <http://www.iki.fi/era/procmail/matching.html> but if you'd
rather see the original, there's a link to the Garching archive to
that on the page as well.

Briefly, the motivation for the \/ is that you sometimes want to refer
back to whatever matched at one point, and use that in further
processing. In sed you can use \1 to refer back to the first
parenthesized subexpression of a match, \2 for the second, etc. These
are called "back references". In Procmail, you get only one backref,
and instead of its being bracketed by parentheses, it's bracketed by
that \/ token and (implicitly) the end of the expression.

Yes, that sucks, but it's better than no backrefs at all.

So in concrete terms, if you have an expression like

    * ^Subject:.*\<(floor|walls|pipe)

and you would like later on to know whether you got a match on "floor"
or on "walls" or on "pipe", you can say

    * ^Subject:.*\<\/(floor|walls|pipe)
    #              ^^ grab answer into MATCH starting here

and $MATCH will magically contain the answer to that question (if
there was a match, of course, in which case the entire condition will
be true, of course, and false otherwise, just as without the \/
token, of course).

There are some minor complications, but that's hopefully enough to get
you back on track.

The scoring stuff is explained in the procmailsc(5) manual page but I
suspect you already knew that. In so many words, scoring gets you a
little way towards the "maybe" in between "yes this is a match" and
"no this is not a match". You get to set limits on what constitutes
"too many" of something, so you can accept messages which mention
Elvis every once in a while, unless it's up above 1% of the contents
of the article's entire text, for example. Or in other words, you can
set limits for how many (for example) instances of a match will count
as "really" a match, or compensate a match which isn't "really" a
match by negative scores on other criteria.

As a sort of bonus feature, scoring can be used for its side effects,
i.e. you learn not only +whether+ there was "enough" of whatever in
the message, but also how many exactly (via the $= variable).

I'm afraid I don't have any good pointers to other articles about
scoring besides the procmailsc manual page. Perhaps you can look at
other programs which use scoring; Gnus comes to mind. There's a Gnus
scoring reference at <http://www.gnus.org/manual/gnus_7.html#SEC190>
(but it's not very much use unless you understand Gnus in general,
sorry).

Hope this helps,

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition