Re: Again on spam with targeted meaningful text

[I'm responding with findings on-list per Marco's request]

I ran the sample message through a few checks. The message consists of alot of extraneous text embedded in HTML, using near-invisible (tiny,light-on-white color) relating to African issues, with an embedded pornad image, with a framing href as the only spammy features. The spam hadapparently been carefully targeted at members of this list, so contained"good bayes words" in abundance.


Spamassassin, in my configuration, didn't hit much:

--- spamassassin scoring ---
 0.1 HTML_MESSAGE           BODY: HTML included in message
 0.1 BIZ_TLD                URI: Contains a URL in the BIZ top-level domain
 1.5 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net

[Blocked - see<http://www.spamcop.net/bl.shtml?68.126.135.169>]0.3 DNS_FROM_RFCI_DSN RBL: From: sender listed indsn.rfc-ignorant.org


Here's the URL:

<ahref="http://1q-2w3e4r-4r5t6jab1y.biz/qa12ws-3ed4rf/b2900-12tj-02sec2/index.htm?butyl";>

--- spamassassin scoring ---

bogofilter actually flagged it as spam, again based on my training.Unfortunately, this was because of the similarity of the "good" text tothe so-called "nigerian scams" that are so prevalent.


--- bogofilter scoring ---
Bogofilter Report:
X-Spam-Bogosity: Yes, tests=bogofilter, spamicity=0.504700, version=0.17.2

   int  cnt   prob  spamicity histogram
[...]
"bigger"                             2  0.000000  0.007812  0.997090 +
"chief"                              2  0.000000  0.007812  0.997090 +
"directors"                          2  0.000000  0.007812  0.997090 +
"documentary"                        2  0.000000  0.007812  0.997090 +
"film"                               2  0.000000  0.007812  0.997090 +
"foster"                             2  0.000000  0.007812  0.997090 +
"officials"                          2  0.000000  0.007812  0.997090 +
"president"                          2  0.000000  0.007812  0.997090 +
"Africa"                             3  0.000000  0.011719  0.998056 +
"African"                            3  0.000000  0.011719  0.998056 +
"moving"                             3  0.000000  0.011719  0.998056 +
[...]
--- bogofilter scoring ---

Keep in mind, this is based on MY training of bayes. Yours presumablywould not score such words as spam. Ifile (also bayes) seems to havetagged it, but again, since the only spam in the body are the href andimage, presumably because of content that indicates spam in my messagestore.


--- ifile scoring ---
ifile Report:
/tmp/spamreport-msg.vDCSWU spam
spam -5462.30123663
ham -5764.29786015
--- ifile scoring ---

ditto for spamprobe:

--- spamprobe scoring ---
Spamprobe Report:
GOOD 0.0000000 2498472742d6ecd69aa1fe3518790d05
[...]
       Spam Prob   Count    Good    Spam  Word
[...]
       0.0000100       1      94       0  the session
       0.0000112       4      28       0  the au
       0.0000112       3      28       0  vice president
       0.0000101       2      31       0  how they
       0.0000121       1      26       0  space in
       0.0000131       1      24       0  to host
       0.0000149       1      21       0  society for
       0.0000196       1      16       0  a location
[...]
       0.9999492       1       0       5  csseditor
--- spamprobe scoring ---

So it successfully defeated bayes, or at least caused mine to score itfor the wrong reasons. And it didn't have a LOT of spammy characteristics.

As this URL was embedded in the body, based on what I've been told here,those checks wouldn't be gain much if re-implemented in procmail. Ifthis were to become a persistent problem, I'd probably lean towardsworking up some additional spamassasin rules. For one thing, that URL ishardly typical, and existing "random letter" detection rules could beadapted. The message did have the "bayes-beating text" as embedded HTML,with tiny, near-invisible text. I don't run spamassassin rules to detectthose, but presumably they'd help. Domain length checks might help. Thedomain itself seems to have been randomly generated, so a fixed list ofdomains wouldn't be overly useful.

So in short: I don't think bayes will be good at stopping these (orrather any "targeted spam" like this), but spamassassin cumulativescoring based on matching header AND body indiactors would.

Now, if it would be useful, simply stripping the offending content(mime-encoded) would keep the offensive ads out, though it wouldn't stopthe spam in any way. That might be one of the layers of defense appliedto "unknowns".


- Bob


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail