ietf-mxcomp
[Top] [All Lists]

Re: Multiple classes of mail

2004-04-08 17:59:07

On Wed, Apr 07, 2004 at 04:43:53PM -0700, Edwin Aoki wrote:
| 
| I think that the notion that there are different levels of confidence 
| for mail is a correct one and one which end-users typically make today.  

Yes, confidence levels are at the core of what I was trying to say.  The
concept of "first" vs "second" class is just vocabulary --- that can
change.

| But the analogy to "first-class," "second-class," etc. is an imperfect 
| one, since in the traditional paper mail system, class is generally 
| specified by the sender based on his or her determination of importance, 
| cost, etc.  It's not generally made by the recipient (or by the post 
| office).  I believe this distinction is a significant one, especially if 
| we intend it as a simpifying assumption for end-users.

The important thing for us, I think, is to unambiguously specify a
mapping between message attributes and confidence level.

The attributes can be RFC2821 or RFC2822 or a mix of both.

The range of confidence levels contains at least one major divide: I
believe it is important to draw a line between "SMTP acceptable mail"
and "SMTP rejectable mail".  Other schools of thought may prefer to draw
the line between "display to user in inbox" and "file to spambox".  If
your focus is on the user experience, the latter line is a primary
concern.  If your focus is on MTA configuration, the former is the
primary concern.  I believe there is room to address both concerns, and
to draw various sets of lines without stepping on toes in the process.

Perhaps we can satisfy more people if we aim to:

1) draw a line between SMTP acceptable and rejectable mail on the basis
   of RFC2821 information only before DATA.

2) draw a line betwen SMTP acceptable and rejectable mail on the basis
   of RFC2822 information as well after DATA.

3) draw more several lines denoting different levels of confidence
   according to forgery, based on both RFC2821 and RFC2822 information.

4) One line would separate known good user-to-user mail from mailing
   list mail.

5) Another line would further subdivide mailing list mail into
   - user-acknowledged subscription
   - user does not acknowledge subscription

   To make this possible, end-users might need to tell the mail system
   two things:
      - what mailing lists they're subscribed to
      - what forwarding addresses send mail to them

   Some end-users are already used to telling the mail system what
   sender addresses they want to whitelist; adding information about
   mailing lists and forwarders shouldn't be a major problem.
   Acknowledgement of mailing lists also helps the lazy unsubscribe
   scenario.

6) One important line would classify messages into "Box Z" --- messages
   which did not come from an acknowledged forwarder or mailing lists,
   and which explicitly failed SPF/Caller-ID authentication tests, and
   therefore looked like a forgery.

I use the term "Box Z" and not "spambox" on purpose.  I don't want to
make the value judgement about spam vs ham; making that judgement opens
the door to the kind of inaccuracy that you see with content filtering
today.

I want to move, instead, toward an email classification system which is
based on unambiguous attributes.  If a user asks me "why did mail from
my mother end up in Box Z?" I want to be able to answer: "because it
failed to meet the requirements of RFC3823 or whatever" --- I do *not*
want to say "because the system decided it looked like spam".
Anyone who's in the business of deciding that something is spam is
fallible.

If we simply call it Box Z, value-neutrally, we can say "well, if your
mother doesn't want to keep getting her mail filed to Box Z, her ISP can
easily do that by following the prescription in RFC3283."  The ISP
doesn't say that Box Z is spam; the user makes that association.
Similarly, "if the spammer doesn't want their mail filed to Box Z, they
can follow the same procedure."  It's up to the procedure to make things
hard for the spammer and easy for the legit folk.

SPF has drawn line #1.  Caller-ID and DomainKeys each draw line #2 in
different, compatible ways.  PGP and SMIME also draw a #2 line.  If we
could agree on the rest of those lines I think everything else would
follow very naturally.


<Prev in Thread] Current Thread [Next in Thread>