[I'm responding with findings on-list per Marco's request]
I ran the sample message through a few checks. The message consists of a
lot of extraneous text embedded in HTML, using near-invisible (tiny,
light-on-white color) relating to African issues, with an embedded porn
ad image, with a framing href as the only spammy features. The spam had
apparently been carefully targeted at members of this list, so contained
"good bayes words" in abundance.
Spamassassin, in my configuration, didn't hit much:
--- spamassassin scoring ---
0.1 HTML_MESSAGE BODY: HTML included in message
0.1 BIZ_TLD URI: Contains a URL in the BIZ top-level domain
1.5 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see
<http://www.spamcop.net/bl.shtml?68.126.135.169>]
0.3 DNS_FROM_RFCI_DSN RBL: From: sender listed in
dsn.rfc-ignorant.org
Here's the URL:
<a
href="http://1q-2w3e4r-4r5t6jab1y.biz/qa12ws-3ed4rf/b2900-12tj-02sec2/index.htm?butyl">
--- spamassassin scoring ---
bogofilter actually flagged it as spam, again based on my training.
Unfortunately, this was because of the similarity of the "good" text to
the so-called "nigerian scams" that are so prevalent.
--- bogofilter scoring ---
Bogofilter Report:
X-Spam-Bogosity: Yes, tests=bogofilter, spamicity=0.504700, version=0.17.2
int cnt prob spamicity histogram
[...]
"bigger" 2 0.000000 0.007812 0.997090 +
"chief" 2 0.000000 0.007812 0.997090 +
"directors" 2 0.000000 0.007812 0.997090 +
"documentary" 2 0.000000 0.007812 0.997090 +
"film" 2 0.000000 0.007812 0.997090 +
"foster" 2 0.000000 0.007812 0.997090 +
"officials" 2 0.000000 0.007812 0.997090 +
"president" 2 0.000000 0.007812 0.997090 +
"Africa" 3 0.000000 0.011719 0.998056 +
"African" 3 0.000000 0.011719 0.998056 +
"moving" 3 0.000000 0.011719 0.998056 +
[...]
--- bogofilter scoring ---
Keep in mind, this is based on MY training of bayes. Yours presumably
would not score such words as spam. Ifile (also bayes) seems to have
tagged it, but again, since the only spam in the body are the href and
image, presumably because of content that indicates spam in my message
store.
--- ifile scoring ---
ifile Report:
/tmp/spamreport-msg.vDCSWU spam
spam -5462.30123663
ham -5764.29786015
--- ifile scoring ---
ditto for spamprobe:
--- spamprobe scoring ---
Spamprobe Report:
GOOD 0.0000000 2498472742d6ecd69aa1fe3518790d05
[...]
Spam Prob Count Good Spam Word
[...]
0.0000100 1 94 0 the session
0.0000112 4 28 0 the au
0.0000112 3 28 0 vice president
0.0000101 2 31 0 how they
0.0000121 1 26 0 space in
0.0000131 1 24 0 to host
0.0000149 1 21 0 society for
0.0000196 1 16 0 a location
[...]
0.9999492 1 0 5 csseditor
--- spamprobe scoring ---
So it successfully defeated bayes, or at least caused mine to score it
for the wrong reasons. And it didn't have a LOT of spammy characteristics.
As this URL was embedded in the body, based on what I've been told here,
those checks wouldn't be gain much if re-implemented in procmail. If
this were to become a persistent problem, I'd probably lean towards
working up some additional spamassasin rules. For one thing, that URL is
hardly typical, and existing "random letter" detection rules could be
adapted. The message did have the "bayes-beating text" as embedded HTML,
with tiny, near-invisible text. I don't run spamassassin rules to detect
those, but presumably they'd help. Domain length checks might help. The
domain itself seems to have been randomly generated, so a fixed list of
domains wouldn't be overly useful.
So in short: I don't think bayes will be good at stopping these (or
rather any "targeted spam" like this), but spamassassin cumulative
scoring based on matching header AND body indiactors would.
Now, if it would be useful, simply stripping the offending content
(mime-encoded) would keep the offensive ads out, though it wouldn't stop
the spam in any way. That might be one of the layers of defense applied
to "unknowns".
- Bob
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail