At 23:35 2002-10-08 +0221, Udi Mottelo wrote:
I just wandering: Suppose one wants to score three words and
s/he wants to be sure that this three words are exist in the text.
The only way that I can see is:
:0 B
* word1
* word2
* word3
{
:0 Bfb
* 1^1 word1
* 1^1 word2
* 1^1 word3
| /do/something/with $=
}
Seems reasonable.
Now, we can be sure that $= >= 3 and every word appearances
at least one time i.e. if $= == 5 then the word1 could not
appearance 3 times.
actually, = 5 under the conditions you provide, one of the words could
appear three times:
1 wordX
1 wordY
3 wordZ
=5
There is a big deficiency in this recipe - procmail pass twice
on the data. Any idea?
Sucks when you're scanning the BODY, but you're missing something - your
conditions, in sum total scan the BODY *SIX* times - there's *SIX* conditions!
Note that you can independantly scan for each word, and store the score:
WORD1=0
WORD2=0
WORD3=0
:0B
* 1^1 word1
{
WORD1=$=
}
:0B
* 1^1 word2
{
WORD2=$=
}
:0B
* 1^1 word3
{
WORD3=$=
}
# you've scanned the body THREE times, but techically, in your original
# condition, you did as much.
# Now, act upon the SAVED SCORES ONLY, with a precheck that each of the
# variables can't be ZERO.
:0
* ! WORD1 ?? ^0$
* ! WORD2 ?? ^0$
* ! WORD3 ?? ^0$
* $ ${WORD1}^0
* $ ${WORD2}^0
* $ ${WORD3}^0
| /do/something/with $=
All untested here, so there's bound to be a simple typo or omission on my
part, but this all seems a LOT more efficient than what you're presenting.
Alternatively, for MORE efficiency, if the /do/something bit should only
occur when all three keywords have been found, nest each successive
operation - the second keyword matches within the action braces of the
first, the third within the second, and the do something within the third
(and withou need for checking for zero values, and without needing to SET
zero values):
:0B
* 1^1 word1
{
WORD1=$=
:0B
* 1^1 word2
{
WORD2=$=
:0B
* 1^1 word3
{
WORD3=$=
# you've scanned the body THREE times, but
# techically, in your original condition, you
# did as much.
# Now, act upon the SAVED SCORES ONLY
:0
* $ ${WORD1}^0
* $ ${WORD2}^0
* $ ${WORD3}^0
| /do/something/with $=
}
}
}
hit: three body scans - ONLY if the each successive scan results in a
positive (which is the case with your original - bailing early when there's
a failure to match). Between this and yours, this shaves THREE (short)
body scans off, and has independant scoring for each matched word.
OTOH, a drawback to this approach is that the initial body scans are
COMPLETE body scans, not bail on first match, so if you have a
match-match-nomatch condition, you scanned the
WHOLEBODY-WHOLEBODY-WHOLEBODY, instead of
JUSTTOTHEFIRSTMATCH-JUSTTOTHEFIRSTMATCH-WHOLEBODY. I'm not sure how
significant an impact this will have on your average search, but the
results when there IS a match on all three, will be faster, and when those
matches are towards the end of the document anyway, there should be
negligible difference in the failed cases.
The simplest solution is:
:0B
* word1
* word2
* word3
* 1^1 (word1|word2|word3)
| /do/something/with $=
If any of the three "first match" body conditions fails, and it bails right
there, if they're ALL true, then the required portion of the conditions is
fulfilled - and the scoring >0 is going to be a GIVEN. However, this
involves four scans of the body in a match condition.
It all depends on what you're trying to accomplish, and whether you want
intermediate match counts (say, because you want individual variables > 3
or something).
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail