Re: Base64 Spam.

On Fri, 13 Jun 2003 00:25:21 +0200, Dallman Ross <dman(_at_)nomotek(_dot_)com>
wrote:

On Thu, Jun 12, 2003 at 02:45:50PM -0700, 
multimedia-fan(_at_)myrealbox(_dot_)com
wrote:

On Tue, 10 Jun 2003 17:21:29 -0700, multimedia-fan(_at_)myrealbox(_dot_)com
wrote:

I know that this has been probably discussed in the past, but can
someone give me pointer on how to de-encode base64 spam messages
BEFORE recipes?

Very few of the messages are getting through like that, and I am
interested in reading about this.


Did I ask a stupid question, a hard question or a boring one?

Either way, I see a small discussion in the archives about a similar
topic but none of them had this specific question. .


Well, first of all, the "BEFORE" kind of threw me.  What do you mean,
"BEFORE recipes"?  You certainly can't do anything in procmail *after*
recipes; the program has ended.  And where you do things in procmail
is *in* recipes.  So, I didn't get it, and especially didn't get the
all-caps that shout something about the distinction (that I didn't
get) is important.


Sorry, maybe I wasn't clear about that.

The reason I asked that, is very few persistent spamemrs are encoding
their spam in base64 and it is getting through, I was kind of thinking
if the filtering runs after decoding the suspicious emails, something
along the lines of nested recipes, first part decodes the message, and
the other part runs the checking, not sure if this is a good idea.

But hey, I didn't mean to shout, when I wrote BEFORE I was trying to
make sure that I am asking a clear questions, didn't mean to offend
anyone.

Second of all, why do you *want* to decode base64?  Do you get
some legit mail that is base-64 encoded?  (Why not whitelist it
and trash the rest?)


That I am considering doing, but what was puzzling me is some legitimate
emails got caught through some of the filters.


I used a sandbox to test the following (not all mine, some are, and some
from the discussions in the archives):


## Base64 encoded html spam in message headers
:0  
  * ^Content-Type:(.*\<)?text/(html|plain)
  * ^Content-Transfer-Encoding:(.*\<)?base64
 {
 LOG="Base64 Encoded SPAM Headers $NL"
 LOGABSTRACT=ALL
 :0:
$base64spam-headers
 LOGABSTRACT=NO
}

 Base64 encoded html spam in message body.
:0
 * B ?? (Content-Type:.*text/html;)
 * B ?? (Content-Transfer-Encoding: base64)
 {
 LOG="Base64 Encoded SPAM in Body $NL"
 LOGABSTRACT=ALL
 :0:
$base64spam-body
 LOGABSTRACT=NO
}



Works on all messages but I tested a message that had some pictures
attached

The message body had the following mess:


------=_NextPart_000_0067_01C330EC.F46990C0
Content-Type: multipart/alternative;
        boundary="----=_NextPart_001_0068_01C330EC.F46990C0"


------=_NextPart_001_0068_01C330EC.F46990C0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


------=_NextPart_001_0068_01C330EC.F46990C0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


------=_NextPart_001_0068_01C330EC.F46990C0--

------=_NextPart_000_0067_01C330EC.F46990C0
Content-Type: image/jpeg;
        name="test.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename="test.jpg"





The message got held as spam.

Here are two recipes I use, though the second rarely ever gets used,
because it's a body-grepping recipe and I only fall back on those
in worst-case scenarios when I can't ID the spam by the headers alone.
(Which I almost always can.)

    # concept: Paul Chvostek <paul(_at_)it(_dot_)ca>
:0  # 030214 () base-64-encoded html head is shrouding more than charset
 *             ^Content-Transfer-Encoding:(.*\<)?base64
 * $    $GO^0  CTYPE  ??  ^^text/html
 * $  $STOP^0  TRUST  ??  ^^$HIGHEST^^
 * $    $GO^0  CTYPE  ??  ^^text/plain
 { RX = "${RX:+$RX, }UBE.CT.HTML+BASE64" }

(That only triggered twice in the last 100 spams, but it doesn't give me
false positives.)

And


     :0  # 030504 () where's the "multipart"?  There's just one encoded part
      *                  CTYPE  ??  ^^multipart/mixed
      *          2^0  B         ??  ^Content-Transfer-Encoding:(.*\<)?\
                                     (base64|7bit)
      * $ -$MAXINT^0
      *         -1^0  B         ??  ^Content-Type:(.*\<)?text/plain
      *         -1^1  B         ??  ^Content-Type:
      * $  $MAXINT^0
      { RX = "${RX:+$RX, }UBE.B+CT.MISMATCH:1" }


(This one is using my "Infinity Hop" algorithm to limit the body
grep to what's needed.  MAXINT is the exact value for "infinity"
from `man procmailsc'.)


Thank you.

I will definitely try this.

If you really want to decode base64, you can use a program on your
system.  Typing "man -k base64" gives me one installed on my system:
it's called, oddly enough, "base64".

12:21am [~/Mail] 482[0]> echo foo | base64 | base64 -d
foo


I didn't even read the man page; I simply tried it like that and it
worked.  It encoded and then decoded.



I still don't think decoding base64 is worth the bother.  It's sort
of like strip-searching people you find have broken into your house
at 3 a.m. to see if they have any burglary tools on 'em.  Hell, if
they're in your house uninvited at 3 a.m., that's damning enough.
And if someone sends text-only or HTML mail base64-encoded, that's
damning enough.



As always, thank you for your advice and your help.



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail