ietf-asrg
[Top] [All Lists]

Re: [Asrg] 2a. Analysis - Spam filled with words

2003-09-11 22:17:29
From: "Terry Sullivan" <terry(_at_)pantos(_dot_)org>
I recently had occasion to try to do some (decidedly non-Bayesian) statistical characterization of ham/spam differences. I ended up with two interesting results:

1) There were four distinct "types" of spam. Variation within each spam-type was much smaller than the variation between spam-types.

2) Only one of the four spam-types was even    remotely close to "ham."


This reminds me of something I heard about a few years ago while attending a lecture on multidimensional math. One example use of extra dimensions was clasification of dinosaur vertebrae from various species. On each specimen a simple measurement was taken and used as a base line, then N other measurements were taken as compared to the base line(to correct for different ages). When plotted in N dimensions vertebrae from different species of dinosaurs formed distinct clouds that could be distinguished easily.

Perhaps a multidimensional Bayesian classifier could find these spam/ham groups on it's own. Each method for bypassing filters in a strange way might be easily discernable as a different cloud.

John Fenley

_________________________________________________________________
Need more e-mail storage? Get 10MB with Hotmail Extra Storage. http://join.msn.com/?PAGE=features/es


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg