ietf-asrg
[Top] [All Lists]

[Asrg] solution space (was Re: Textual Analysis is not the solution)

2003-03-03 10:15:05
okay, basically I agree with your thesis.  I'd state it a different way:
spam is in the eye of the beholder.  the same message can be legitimate
to one recipient (especially if it was solicited  - maybe the guy really
does think he can enlarge a part of his anatomy :) and illigitimate to
another recipient.

so trying to distinguish spam from non-spam by analysis of text, without
knowing the recipient's preferences, is inherently of limited
applicability.

(you might argue that there are some messages that can be determined to
be unwanted no matter who they are sent to - say viruses - but as far as
I can tell this is a very narrow class.  maybe I actually want to get
that .EXE file in email.  maybe I want to receive mis-labelled content
to see how my mail user agent will handle them.  etc.)

but I think it's too early to start arguing about *the* solution. 
for now we should try to understand the solution space - see
what general techniques we have at our disposal.

offhand I can think of several categories of tools:

- message analysis techniques 
  (looking at characteristcs of messages)

- recipient-specified preferences
  (recipient says "I don't want spam" or "I can't read Korean")

- originator identification/authentication/tracing
  (e.g. if you can reliably tell whether a message is coming from a
  known spammer, or merely someone of unknown reputation, or someone
  you know.  this happens at various levels of granularity - 
  you might know the ISP that is originating the mail, or you 
  might know what business has that IP address block, or you
  might know the specific MTA, or you might be able to identify
  the sender via some kind of authentication, either in the
  message or out-of-band (say SMTP authentication).)

- message filtering (e.g. removal of viruses, annoying HTML, whatever)

- differential message handling by relays
  (e.g. messages from known senders are allowed to be larger, 
   contain a wider variety of content, be accepted for delivery more 
   quickly; messages from unknown parties might be delayed
   and/or restricted to size or content; messages from known
   spammers or that are known to have viruses are bounced or
   discarded)

- recipient feedback mechanisms (e.g. blacklists, whitelists)

these should not be considered mutually exclusive - clearly they
can be combined, and often are.

which ones am I missing?
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>