procmail
[Top] [All Lists]

RE: ^^3^^

2002-10-28 17:22:20
Professional Software Engineering (Sean Straw) wrote:

At 20:08 2002-10-28 +0100, Dallman Ross did say:
58 of my most recent 100 spam messages were ID'd by this recipe.

Well, dangit, share the details!

Hmm.  Well, okay.  But further down below, after some
discussion.

By the way, in my copy of my original sent email, the line prior to 
the one you quoted above ended in a colon; and I had no empty
line between it and the one you quoted.  In my copy of my mail 
that I received through the list, however, an extra blank line was 
inserted.  I wonder why.  As a test, I will use a colon here:
And there shouldn't be a blank line between this line and the colon.

Okay, that aside aside, let's return to the main topic here. :)


    :0  # 021026 () score should equal three
      * $ ! $= ?? ^^3^^

Why are you using the $ flag?  in fact, is it even necessary 
to use $= when checking a score variable - syntactically, 
shouldn't you just use:

         * ! = ?? ^^3^^

Lots more testing on this end confirmed that neither one works.  :-)


If you place the recipe into a sandbox and pass the same 
message at it, does it act the same?  If so, that's good 

Yes.  It's all perfectly reproducible.  And I erred seriously
in believing my recipe was working all these weeks!  In fact,
everything that got as far as the nested braces in that
recipe set was being tagged.  I never noticed, because most
things getting that far are, indeed, spam!  Wow, that blew
my mind, since it is far and away the anti-spam recipe of mine
with the best penetration.

Anyway, having realized that the syntax simply doesn't work
with `=' or `$=' -- it makes no difference to the failure
if I use the dollar sign (with var-expand-token prepended) or
not :-) -- I fixed the recipe.

(or, simply code your outer recipe to add three and use 
negative scoring on each element, then you don't need a 
nesting at all, unless you're looking to use the score 
value for something)

No, that won't work, because scores above zero but less
than three are usually spam.


FTR, I was able to reproduce your problem with:
[snip]

That was helpful, Sean, thanks; it led to my realization
that the syntax I was relying on was bogus.

Okay, here is the current version of this recipe set.
Let me explain some things about it.  Microsoft clients
typically throw in three X-headers.  One or more bulk-y
clients used frequently by spammers puts in one or
two of these headers, but not all three.

Explaining further, to minimize false pozzes, I don't bother 
looking at mail that has passed certain pre-tests of mine 
earlier on and reached a "TRUST" rating of $HIGHEST or $HIGH.
(I have five TRUST stages in all.  Some of my spam
recipes exclude various of these levels of TRUST from
consideration if that helps me avoid false positives.)
Finally, I don't bother checking mail whose headers imply
SquirrelMail was the client used, because it, too, inserts
some of these Microsoft-like headers (but not all three).
My rationale is that not too many spammers will use Squirrel
Mail, but a non-trivial number of techie-types who might
write me could use it.

Readers should understand further that I have already taken
out disposition notifications by this point in the rc.  That 
caveat is necessary, because one of the headers I look at here
is often not present for disposition notifications, but
otherwise is for normal messages coming from the exact same
(Microsoft) client.  Nevertheless, I've put a test in now
that will give putative MS mail clients a chance to score
on a disposition-notification header to avoid my spam tag.

Finally, let me add that I'm trying out some new formatting
for recipes.  The point is to offset, visually, exclusions 
in the recipe that don't have much to do with the basic
heuristic.  I find it harder to read recipes that have a
bunch of conditions in them, where it quickly becomes muddy 
as to which conditions are critical to the algorithm and 
which have a secondary, or even extraneous, meaning.  The
visual offset is supposed to indicate that those conditions
are not part of the main algorithm, but are a side-issue.
Let me know what you think of the visual concept.

 :0  # 021003
     # exceptions ..............................................
                       * $ !  TRUST ?? ^^$HIGHEST^^
                       * $ !  TRUST ?? ^^$HIGH^^
                       *   ! ^X-Mailer:(.*\<)?SquirrelMail
  # ............................................................
  # 021003 () look for badly cloned Microsoft MUAs (oft used for bulk
mail)
  * 1^0  ^X-MSMail-Priority:
  * 1^0  ^X-MimeOLE:
  * 1^0  ^X-Mailer:(.*\<)?Microsoft
  {
     TOT = $=

     :0  # 021029 () if score equals three, do nothing
      * TOT ?? ^^3^^
      { }

     :0 E 021029 () else, disposition notifications get another chance
      *       3^0
      * $ -$TOT^0
      *      -1^0  ^Content-Type: multipart/report
      { RX = "${RX:+$RX, }UBE.OH.BLEM-MUA" }
  }

-- 
Dallman Ross

"If you find a path with no obstacles, it probably does not lead to
anywhere."
        Thoughts of Rev. Sunnan Kubose, from _Zen in the Markets_ 


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>