On Thu, 11 Sep 2003 23:15:58 -0600, John Fenley wrote:
From: "Terry Sullivan" <terry(_at_)pantos(_dot_)org>
1) There were four distinct "types" of spam. Variation within
each
spam-type was much smaller than the variation between
spam-types.
2) Only one of the four spam-types was even remotely close to
"ham."
This reminds me of something I heard about a few years ago while
attending a lecture on multidimensional math.
[snip]
When plotted in N dimensions vertebrae from different
species of dinosaurs formed distinct clouds that could be
distinguished easily.
Perhaps a multidimensional Bayesian classifier could find these
spam/ham groups on it's own. Each method for bypassing filters in a
strange way might be easily discernable as a different cloud.
You read my mind, John. The "distinct clouds" effect was more or
less exactly what I was hoping to find. (Unfortunately, the
distinctions I found were not even remotely "cloud-like"--each one
was more like "amorphous blob, well-anchored by a couple of
outrageous outliers." Which was still an interesting analytical
result, but not at all what I had hoped to find.)
The bad news: while "free" multidimensional methods are great for
analysis, they are utterly impractical for classification, because
they do not scale well at all. (CPU requirements for these methods
grow *at least* quadratically with the number of items being
analyzed, and higher-dimensional solutions are even more demanding.)
Ultimately, your *core* point--that multidimensional characterization
methods ought to work better than unidimensional methods--is exactly
right. (Exactly the opposite is true when one is trying to make
fine-grained distinctions among already highly similar documents. In
that case, unidimensional methods are generally superior.)
Ultimately, multidimensional characterization "works better" because,
as Andrew recently reminded us, spam differs from regular email in
lots of different ways (i.e., across multiple dimensions), while
regular email is much more homogeneous.
- Terry
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg