ietf-asrg
[Top] [All Lists]

RE: [Asrg] Trust relationships etc.

2005-07-20 23:46:58
On Wed, 20 Jul 2005, "Brian Azzopardi" <briana(_at_)gfi(_dot_)com> wrote:

I simply believe it makes a LOT more sense to identify most spam by
observing its variance from accepted and agreed form.  
E-mail coming from a given  correspondent which DOES NOT LOOK LIKE the
mail you expect to get from that correspondent 
can, and probably should, be quarantined or even t-canned until a
different treatment is indicated.

If I get a 170K-byte PIF file attachment from my dear old Aunt
Mildred, it's a pretty safe bet that 
it's a virus or worm... she would simply never legitimately send me
anything like that (nor, in fact, 
would probably anybody else).

You just described statistical filtering, a well-known variant being
Bayesian filtering. Later generation Bayesian filters can be very
effective and if Aunt Mildred gets zombied the filter will allow her
email through while still keeping the spam sent from her machine out. 

No, not really.

Those approaches basically look at the words used, etc., and that's not really 
what I am talking about (not for MY purposes, anyway).

My approach involves primarily things like the presence or absence of HTML 
(and, 
more finely, what TYPES of HTML tags are present), the presence or absence of 
attachments (and, more specifically, what TYPE of files are attached), message 
size, and so forth.  Those would be checked subject to a fine-grained 
"permissions list" established by each recipient, based upon who the stated 
sender of the message was.  This could either be managed directly, or 
indirectly 
via some sort of "allow this sender to send this type of material in the 
future" 
dialog which would open the restrictions for that sender to allow the specific 
(perhaps even unidentified) features which caused the mail to be questioned.

The DEFAULT (for unrecognized/prevously unknown senders) would be NO HTML, NO 
attachments of any kind, and limited message size (25K, 50K, 100K, or whatever, 
but probably on that order).  NOTE SPECIFICALLY that these defaults would block 
essentially all worms and viruses and other e-mail-borne malware exploits 
coming 
from unrecognized senders... and the narrow established permissions would 
probably block most or all such stuff coming from RECOGNIZED senders too.

A nice extension to my approach would be to add additional content expected to 
be found in a message from the given sender.  For example, a message body from 
a 
familiar newsletter would be expected to contain the masthead or copyright 
notice found in every legitimate copy of that newsletter.  A message coming 
from 
a particular correspondent might be expected to contain their characteristic 
signature file.  A message from a Yahoogroups mailing list might be expected to 
contain a Yahoogroups-type ad, or perhaps the group name in square brackets as 
part of the subject.  One might even include the characteristic mailer software 
(based on the message header tag) that the sender is known to use.  A message 
claiming to be from that sender and NOT containing their characteristic content 
and style would be immediately treated as suspect.

You're right in that a suitable Bayesian filter MIGHT recognize the difference 
between Aunt Mildred's vocabulary and that of a spammer or other abuser, but 
spammers for the last year or more have been targeting Bayesian filters by 
adding large amounts of gobbledygook to their spams to confuse their signature 
vocabulary.

SPF, reputation, et al can't do that.

Of course.  And I consider it a (perhaps-)fatal and fundamental flaw in those 
approaches that they are based more on HOW the message was sent than WHAT was 
sent.

It's much harder for a spam or virus or worm to spoof (in a general and 
universal way) the CONTENT STYLE of the owner of the machine they've 
commandeered than it is to simply spoof a sender ID.  Even more, a typical 
recipient will likely have **NOBODY AT ALL** authorized to send them executable 
content... which would essentially make it an impossibility to spoof ANYBODY 
and 
get viruses, worms, or other zombie-spambot software into their machine via an 
E-mail vector.

The "problem" with bayesian filtering and other content checking methods
is that you need to work on at least a non-trivial part, say 4k, of the
message body. This is fine for most organisations and individuals, but
maybe too resource intensive for ISPs. 

I'm not proposing that this filtering be done by ISPs, at least not at the 
final 
levels.  There is simply (in aggregate) far more computing power available (and 
more FREELY available) at the recipient machines than there is anywhere else 
enroute.

At least as a first-cut, I think it's fine to do all the filtering at the 
recipient level (and especially if the downloading and filtering can be done in 
a nonblocking way, hopefully mostly transparently, which is of particular 
concern for dialup users).  

Keeping the statistical data for
each recipient is expensive. In organisations it might be possible to
have keep statistical data on a per-department basis with a possible
loss of accuracy, but for ISPs this can't be done.

Straw man.  I don't think we need to limit our discussion to only just 
techniques which are suitable for ISP-level implementation, especially if that 
seems to eliminate consideration of the most promising and useful approaches.

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections!  http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg