From: "Terry Sullivan" <terry(_at_)pantos(_dot_)org>
I recently had occasion to try to do some (decidedly non-Bayesian)
statistical characterization of ham/spam differences. I ended up with two
interesting results:
1) There were four distinct "types" of spam. Variation within each
spam-type was much smaller than the variation between spam-types.
2) Only one of the four spam-types was even remotely close to "ham."
This reminds me of something I heard about a few years ago while attending a
lecture on multidimensional math. One example use of extra dimensions was
clasification of dinosaur vertebrae from various species. On each specimen a
simple measurement was taken and used as a base line, then N other
measurements were taken as compared to the base line(to correct for
different ages). When plotted in N dimensions vertebrae from different
species of dinosaurs formed distinct clouds that could be distinguished
easily.
Perhaps a multidimensional Bayesian classifier could find these spam/ham
groups on it's own. Each method for bypassing filters in a strange way might
be easily discernable as a different cloud.
John Fenley
_________________________________________________________________
Need more e-mail storage? Get 10MB with Hotmail Extra Storage.
http://join.msn.com/?PAGE=features/es
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg