From: "Terry Sullivan" <terry(_at_)pantos(_dot_)org>
I recently had occasion to try to do some (decidedly non-Bayesian) 
statistical characterization of ham/spam differences.  I ended up with two 
interesting results:
1) There were four distinct "types" of spam.     Variation within each 
spam-type was much    smaller than the variation between    spam-types.
2) Only one of the four spam-types was even    remotely close to "ham."
This reminds me of something I heard about a few years ago while attending a 
lecture on multidimensional math. One example use of extra dimensions was 
clasification of dinosaur vertebrae from various species. On each specimen a 
simple measurement was taken and used as a base line, then N other 
measurements were taken as compared to the base line(to correct for 
different ages). When plotted in N dimensions vertebrae from different 
species of dinosaurs formed distinct clouds that could be distinguished 
easily.
Perhaps a multidimensional Bayesian classifier could find these spam/ham 
groups on it's own. Each method for bypassing filters in a strange way might 
be easily discernable as a different cloud.
John Fenley
_________________________________________________________________
Need more e-mail storage? Get 10MB with Hotmail Extra Storage.   
http://join.msn.com/?PAGE=features/es
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg