ietf
[Top] [All Lists]

Re: Apology Re: Principles of Spam-abatement

2004-03-16 22:58:36
On Tue, 16 Mar 2004, Ed Gerck wrote:

Trust on the sender cannot be proven by the sender (self-assertions cannot 
induce trust -- e.g., "trust me" doesn't work), but must be calculated using 
sources independent of the sender. The sender may hint to a specific trust 
service used, and even provide it and its values, but we should be able to 
get 
that information from the service directly and/or chose our own trust services
independently. In doing so, trust on the sender is what the receiver 
determines at a specific time based on a behavior model for the sender.
If the sender cooperates, the process can be faster and easier. But the
sender cannot determine the process.

The problem is, thus, not how do you determine trust, especially with all 
the different definitions of spam possible, but how do you want to do it.

I wrote one whole response earlier but deleted it (fortunately, as Dean
went through my points far more tersely than I was about to).  Here I
just can't stand it.

Ed, are you not paying attention?

It is fundamentally, intrinsically, eternally IMPOSSIBLE TO IDENTIFY
INDIVIDUAL HUMANS on the internet.  I can sit at my laptop and create a
hundred entirely real accounts with no humans behind them, with real
humans behind them, with me behind them, with alien invaders who will
eat your head behind them.  From the other side of my network connection
YOU CANNOT TELL which of these are real and which are fake.  You will
never be able to tell without violating so many of my civil liberties
that I (and everybody else on the planet) would be out in the streets
rioting to get them back.

Mail sent out by my perfectly functional MTA (any of them that I might
choose to install or one that I might custom-write to serve a particular
purpose) is for all practical trust-based purposes ANONYMOUS.  Mail has
always been designed to be anonymous (paper mail too).  There are
individually authenticated services and there are anonymous services,
and mail transport is an anonymous service because it crosses
authentication boundaries.

Mail (paper or otherwise) has an envelope, sure, but the only thing on
it that you can trust even a little bit is the set of postmarks it
develops along its route to your mailbox (and even here, you can really
only trust the LAST postmark in the chain, the one one hop upstream).
Your MTA cannot fill in the envelope.  That can only be done by my (the
sender's) MTA unless you've developed that psychic mail transport
mechanism.

This is no different from paper mail.  YOU have to fill in the address
information on a paper envelope.  You control the pen as surely as you
control your sending MTA -- every byte or stroke can be truth or lie.
You can lie about your return address.  You can fill the envelope with
ricin and anthrax or with money and praise (I'd prefer the latter,
naturally).  I cannot tell if the envelope tells the truth before
opening and reading the message.  I cannot even tell with CERTAINTY that
the envelope tells the truth AFTER opening it except by an out of band
communication with the sender.

If you want to argue that all mail has to be sent the electronic
equivalent of "certified mail" in the paper world, forget it and think
through the metaphor.  First of all nobody EVER sends certified mail in
the paper world except when money is on the line because a) it COSTS
money to have it certified; and b) it is a pain in the ass to have it
certified (it costs time).  Finally, even in the paper world, "certified
mail" generally means that you send it TO a positively identified
receiver with a guarantee that they will receive it.  You are generally
NOT required to show some sort of id proving that the return address is
valid and that you are the person corresponding to the return address
and indemnity information.  Maybe you are.  Maybe you aren't.  Maybe
you're just a messenger boy.  Maybe you're sending well-certified
anthrax and lie about everything on the return/sender forms you fill
out.  In any event, you likely own, literally, the certifying machine
(the sender).

Spam and paper mail abuse is not a problem that can be solved by
addressing trust of identities.  It is fundamentally a problem WITH real
identification.  In the HUMAN world, it is remarkably difficult, and
remarkably uncommon, to validate that a human is who they say they are;
most glib examples that have been cited to show that it can easily be
done show the opposite -- that it is NOT easy and it IS expensive and a
PITA.  My kids have to bring birth certificates and photo id's to
certain things (SAT tests, school registrations).  These
documents/tokens are not easy to file, to find, to to keep straight and
available and are easily lost or stolen.  

I have to show certain forms of legally certified id in order to
validate certain transactions, mostly involving money, and I have to
jealously guard them as they are easily lost or stolen.  Rituals
involving them (such as getting a loan or cashing a check) are time
consuming and inconvenient.

As a general rule we do NOT walk into a party and strike up a
conversation with somebody and say "Hi, I'm Rob, here's my driver's
license, a couple of credit cards, my Duke ID, and do you need to see my
birth certificate or passport or will you talk to me know?"

All of which can be and are forged, by the way...

To extend the analogy to the Internet, make the party a masked ball,
many of whom are strangers.  I might wear a different costume every time
we meet and all you know of me is -- my card, presented at the door and
impeccably printed on the very best paper, that contains one valid piece
of information, the house from which I LAST came to arrive at the party.

In the human world, there is a degree of legal constraint that limits
forgery, and a degree of legal REstraint that protects privacy.

On the Internet, there are no laws preventing me from creating a new
"identity" for each and every transaction.  I could send one message to
this list as rgb, the next as brg, the next as hngryalien.  I could write
an agent that sends mail to a million people from a million different
(all transient) user ids that might or might not even correspond to real
accounts.  As long as the mail isn't in violation of a very few, mostly
very permissive, laws (no kiddie porn, no spam without an opt-out) it is
right as rain and totally legal.

You CANNOT trust assertions of personal identity on the Internet.  You
will NEVER be able to trust assertions of personal identity without
draconian laws (that I, the ACLU, and many others will oppose, as they
are a much greater threat to my personal liberty than mere spam) and a
certification mechanism that would be reminiscent of the Third Reich's
-- "identity cards" for all humans, carried at all times, required for
all transactions, with massive penalties for lying.  Otherwise, I'm
sending mail from MY laptop and I'll make new accounts up when I want to
and you can neither tell (you're on the wrong side of my security
boundary) nor stop me if you could tell.  Maybe they are for my kids.
Maybe they are for me, so I can use an alias email address when I'm
forced to give an email address to sign up for some service (one always
suspects that these addresses can be grazed or sold for spam mailing
lists).  Maybe the accounts I make are for fun and practice.  Maybe they
aren't even there, as I could be using telnet as an MTA and just making
them up as I go.  How ELSE can I arrange for my kids to get email from
santa(_at_)the(_dot_)north(_dot_)pole at suitable times of year?

Let's examine the LOGIC of a mail transaction from the receiver's point
of view, at least under current network protocols.  A system connects.
All you know about that system is its IP number.  You CANNOT know what
is behind the IP number.  You CANNOT know the route that the packet took
(if any) before it arrived at the IP number immediately upstream because
even if a protocol supposedly encodes it, the sender has complete
control over the assembly of packets it transmits and can lie through
its digital teeth if it suits it to do so.  All you get is the packet,
and the ONLY part of the packet you can trust is some of its NETWORK
(not mail) header.  As in maybe the ethernet (or whatever physical layer
you are using) header, maybe part of the IP header.

You are therefore forced to "trust" the CONTENT of the packet to tell
you the rest of what you need to know, or the transaction will not work.
If the packet is part of a mail transfer exchange, it will contain
information like the destination, the source, the return address, the
content.

YOUR MTA CANNOT VALIDATE ANY OF THIS OUTSIDE OF WHAT IS IN DNS or other
network registries. The destination might not exist.  The source might
not exist.  The return address might not exist.  The content might be
true, false, legal, or illegal as hell (but encrypted).  The most the
MTA program can do is look up an IP number and see if the
header/envelope data is valid and well-formed and consistent.  It CANNOT
validate whether the alleged sender EXISTS, let alone whether or not it
(not "he" or "she", as it may or may not be human) should be trusted.
As far as the MTA is concerned, the mail is being sent by an MTA, not a
human.  What the MTA sends is obviously out of the control of the
receiver, and barring a psychic receiver or universal identity papers
and the Gestapo, the MTA can lie.

So please, please, please, stop trying to push a system that is supposed
to solve the spam problem based on "trust" of individual identity tokens
on the internet.  Individual identity tokens are intrinsically
impossible to arrange without literally completely restructuring human
society in very disagreeable ways; regulating spam by passing laws that
just take spammers out and shoot them would be easier, cheaper, and more
effective (and less likely to provoke a revolution).

If you want to focus energy on trust, worry about establishing trust of
an entity that is NOT trivial to alter and control without breaking the
network itself -- the network identity of the sending MACHINE(s).  I am
not able to know a damn thing about what is behind a sending MTA's IP
number, but I MUST know the sending MTA's IP number itself or we cannot
set up a TCP-based transaction.  The sender cannot (generally) forge
this because my return packets in the mandatory handshakes would not
reach them because they'd be ROUTED somewhere else by agents outside of
the sender's control. The hierarchical, routed, externally controlled
nature of network traffic and the fairly intelligently regulated
apportionment mechanisms for IP numbers make it relatively difficult,
and even somewhat expensive, to change IP numbers promiscuously, and the
network transaction just plain breaks if you stray from the protocols
and egregiously forge things.

You (or rather, your MTA) can, therefore, decide whether or not you
"trust" the sending IP number.  You will generally only know for SURE
the IP number one step upstream on a message that might have taken
several hops before delivery, but here there is a very reasonable chance
of building a chain of trust back to the sending MACHINE (not human,
machine!).  The delivery information in a mail message's envelope/header
is likely one of the MOST reliable parts of that header, as there is at
least one part of it that cannot be forged by the sender.  I might
indeed make a thousand accounts to fool you as to who I am, none of
which you can IN PRINCIPLE validate, trust-wise (since they may not
exist and are known to an MTA by electronic bits that can be set by my
whim) but you can observe over time that whenever you get mail from my
MACHINE, it tends to be garbage.  (You cannot tell even THAT without
looking -- psychic processes not allowed -- so you'd have to build an ex
post facto discrimination process and accumulate statistics to be able
to identify my machine as a problem).

Eventually, on the basis of this data, you might fairly conclude that my
MACHINE has been taken over by bad fairies, or that its owner is a bad
fairy, and stop listening to it and accepting mail from it; you can
never "filter out" the good fairy messages from bad fairy messages from
this one machine as the machine can hijack every single element of the
"identity" of trusted good fairies. If my machine (really my machine's
IP number) "belongs" to an organization that you could in some sense
"trust" (or at least reliably contact, with properly maintained whois db
information and a postmaster address that is both correctly aligned with
MX records and that points to a human) you might be able to send a
message to postmaster that bad fairies are at work and that mail from
the machine will regretfully no longer be accepted until they are hit
with Black Flag and their little wee corpses are swept up and burnt.

This approach would actually work to abate spam.  This actually DOES
work, right now, as what is likely the secondary method of controlling
evil spew after (or as a feature of) MUA and MTA filters.  Blacklists
and whitelists are nothing but shared or local assessments of trust, but
trust applied to MACHINES (or networks) that are relatively stable
entities and that are "locateable" via the routing of packets, usually
to within a single LAN or domain if not to an actual system in an actual
room, by using required information that won't ALL be untrustworthy.

This is why I like Jeff's suggestion of focussing energy on tightening
up this HOST or NETWORK based trust mechanism, and making the COMMON
PRACTICE of blacklisting abusive networks and hosts a bit LESS arbitrary
and possibly more effective.  I'm not entirely convinced that the IETF
can really help here -- as Dean suggests, it is quite possible that the
IETF can do NOTHING to abate the spam nuisance (which is not to say that
nothing is or will be done, just that the IETF will have no role in it).
Here, at least, it seems like there MIGHT be a role for it and that the
matter is worth pursuing.  Individual "trust" mechanisms, when we cannot
even establish an individual IDENTITY mechanism (because the IETF's
purview pretty much stops at the network boundary abstraction (the
network interface), are not worth pursuing.

   rgb

-- 
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     
email:rgb(_at_)phy(_dot_)duke(_dot_)edu