Re: Base64 Spam.

On Thu, Jun 12, 2003 at 03:58:13PM -0700, 
multimedia-fan(_at_)myrealbox(_dot_)com
wrote:

Sorry, maybe I wasn't clear about that.

The reason I asked that, is very few persistent spamemrs are encoding
their spam in base64 and it is getting through, I was kind of thinking


Btw, I don't really want to harp on your writing or turn this into an
English lesson (though I do teach college English), but there is a
vast difference between

        "very few . . . [spammers] are encoding their spam . . . ."
and
        "a very few . . . [spammers] are encoding their spam . . . ."

The one little indefinite article makes all the difference in the world
to a human trying to parse what you are saying.  :)

. . . what was puzzling me is some
legitimate emails got caught through some of the filters.

I used a sandbox to test the following (not all mine, some are, and
some from the discussions in the archives):


Since you are using a sandbox and have a log, here's a suggestion:
turn on verbose logging and see what exactly caught it and what
didn't.

I think you will find the first recipe set (which is the one I
presume you got from the archives) works as advertised, while the
second (which is the one I presume you wrote) is giving you the
false pozzes.

## Base64 encoded html spam in message headers
:0  
  * ^Content-Type:(.*\<)?text/(html|plain)
  * ^Content-Transfer-Encoding:(.*\<)?base64
 {
 LOG="Base64 Encoded SPAM Headers $NL"
 LOGABSTRACT=ALL
 :0:
$base64spam-headers
 LOGABSTRACT=NO
}



While the recipe set looks okay to me, I bet you will discover
that you get the exact same log output if you leave those
LOGABSTRACT lines out altogether.  Certainly you don't need
to turn on "all" to quit logging after one recipe.  The logging
of the last recipe is the default, anyway.  And as soon as
your spam gets past your first LOGABSTRACT invocation here,
it is saved, and procmail ends, so the second invocation won't
ever even happen.  Ditto below.

 Base64 encoded html spam in message body.
:0
 * B ?? (Content-Type:.*text/html;)
 * B ?? (Content-Transfer-Encoding: base64)


There is no reason for the parentheses in either condition.
You can just lose them.

 {
 LOG="Base64 Encoded SPAM in Body $NL"
 LOGABSTRACT=ALL
 :0:
$base64spam-body
 LOGABSTRACT=NO
}



Since your shmancy logging directions aren't doing anything
useful, there really is no reason for the nested braces at
all.  Just run your recipe, with a lock, and with the filename
to save to on the action line.

Works on all messages but I tested a message that had some pictures
attached

The message body had the following mess:


------=_NextPart_000_0067_01C330EC.F46990C0
Content-Type: multipart/alternative;
      boundary="----=_NextPart_001_0068_01C330EC.F46990C0"


------=_NextPart_001_0068_01C330EC.F46990C0
Content-Type: text/plain;
      charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


------=_NextPart_001_0068_01C330EC.F46990C0
Content-Type: text/html;
      charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


------=_NextPart_001_0068_01C330EC.F46990C0--

------=_NextPart_000_0067_01C330EC.F46990C0
Content-Type: image/jpeg;
      name="test.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
      filename="test.jpg"





The message got held as spam.


Well, of course it did!  Methinks you might want to diagram out a
flow-chart of the logic behind what your recipes are asking.  You
say, "Look inside the body for 'Content-Type: text/html;'" (yet
you leave off the perhaps-still-more-scurillous "text/plain").
Then you say, "Now look in the body for base64 stuff."  But while
those two things appear in the body, they are not related!  You
need to test for the lines being close to each other.

And you want to put ^ at the start of the expression, anyway, to
save procmail lots and lots of work looking rightward of the start
of each line when it doesn't need to.

Okay, so you have a recipe that catches too much, but you didn't
analyze the (verbose) logs to see if you could help it, and instead
you want, right away, to run an external base64 decoder on all this
mail so you can further grep bodies for spamish content, all for
the "very few spammers" who are messing with you this way, and
that's a helluva capitulation, process-wise, to have to make.
It isn't necessary.  Let's fix your broken recipe and allay the
false pozzes right there.


        :0:
        * B ?? ^Content-Type:(.*\<)?text/.*\
               ^Content-Transfer-Encoding:(.*\<)?base64
        $base64spam-body


Try that in your sandbox.

-- 
dman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail