Re: paralysis

On Sun, 7 Mar 2004, Michael Thomas wrote:

Paul Hoffman / IMC writes:
 > At 8:19 AM -0800 3/6/04, Michael Thomas wrote:
 > >So... instead of pointing out the obvious that
 > >there is no silver bullet, wouldn't it be a lot
 > >more productive to frame this debate in terms of
 > >what incremental steps could be taken to at least
 > >try to change the overall climate?
 > 
 > Only if such framing includes the costs of the steps. To date, most 
 > of the initial proposals we have seen on this (and many other) lists 
 > have three attributes in common:
 > 
 > - They don't list the obvious problems
 > 
 > - They don't even guess at the costs of those problems
 > 
 > - They don't have an analysis of how hard or easy it will be for 
 > spammers to adapt to the proposal

Fine. Truth in advertising is wonderful. Then
what?  From what I can tell, anything that falls
short of perfection then gets summarily
executed. What metrics do you suggest when the
answer is less than perfect that doesn't result in
paralysis? That seems to be the real breakdown
here.


There is no real breakdown here, and perfection isn't the issue.  A
proposal doesn't have to be perfect; it has to be realistic and not
obviously flawed.

It seems fairly obvious that any serious proposal for anything, let
alone a complex problem such as spam abatement, should include a
feasibility and cost/benefit analysis.  This is SOP throughout business,
government, academe, engineering -- why should IETF proposals and
discussions be exempted from this?

Vernon is pointing out that most of the discussion on this topic on this
list in the recent past has omitted these components, and propose
solutions over and over again that have either been proposed in the past
but rejected as infeasible or expensive or that have been TRIED in the
past, are implemented now, and that are not provING (now, in real time)
to be tremedously effective in preventing spam.  In a previous reply his
remark about some of the proposals being "innumerate" was dead on the
money -- in most cases a very simple analysis of the actual numbers
demonstrates that a proposed measure, after being implemented at great
expense and inconvenience, will only affect a tiny fraction of the
problem (for example) or will not have any effect at all.

There are several things one should accept in any discussion of spam
abatement.  The first and foremost (one that might well go at the very
head of the "principles" statement we were discussing last week) is that
there MAY BE NOTHING THAT THE IETF CAN DO at the protocol level to
control spam, at least not directly.  If you prefer this phrased in a
prettier way, it may be that any measures that WOULD result in an
abatement of spam are all cures that are worse than the disease, either
because of astronomical costs or because they would necessitate removing
some desired/fundamental property from email (such as the ability to
receive mail from strangers without a complex dance that would be even
more annoying and stultifying to electronic communication than spam is).

My memory isn't what it used to be (and it was never very good) but here
is a short list of what I have heard proposed recently as ways of
abating spam (and in some cases, other forms of network abuse such as
viruses as well):

  a) Add a "cost" per message.  Bill Gates himself came out in public
favor of this in the newspaper over the weekend.  (A cynical public is
invited to wonder why.)

  Pros:  "Some people estimate" that a cost of as little as
$0.01/message would deter spammers.  [Who these people are and why their
guess is any better than mine remains unsaid.  I personally note that
costs of anywhere from a dime to a dollar plus the hassle of having to
physically handle paper, envelopes, postage do not seem to have the
slightest effect on the direct advertising fraction in my real mailbox
on a daily basis, with a persistent noise (advertising) to signal (all
other forms of communication combined) ratio that easily exceeds 2:1.]
"It is believed" (by these same people) that everyday users won't mind
paying the cost in time or money because they don't send much mail.

  Cons: I don't want to pay any cost per message.  I don't want to solve
a puzzle to send mail.  I don't want to have to solve a puzzle eight
thousand times to send mail via a list.  I don't want to have to manage
a cost-based apparatus.  The freedom of the Internet is far more
valuable to me than spam abatement and this is a cure worse than any
disease.  Note that I'm just giving MY response to this proposal.  I
send twenty or thirty pieces of mail a day and there are other cheaper
methods of controlling spam.  Finally, I strongly suspect that the
"people" who are estimating that cost will deter spammers are at least
in some cases people who stand to make money hand over fist charging it.

The fundamental premise here seems to be that we are more able and
willing to pay a higher cost for mail than spammers, in spite of the
fact that they actually MAKE money from email and I just use it for
communications with no obvious or direct monetary yield and have to pay
for it with real money, not a fraction of the income generated by the
activity itself. I can only conclude that this proposal is promoted by
individuals who don't actually have to try to get money (from a grant
agency, from a corporate budget, from their own pockets) in order to pay
for mail-related infrastructure and/or by individuals who hope to profit
by collecting some of those added costs.

  b) Require all mail to be electronically signed.  In some cases
encrypted as well, imagining encryption to be a "cost" that might deter
spammers.  Signature/encryption schema vary, tools required to enable
actual authentication of said signatures left vague.

  Pros:  A signature permits you to positively identify mail from
friends and people you know that are not spammers, IF you have an
authority or agent capable of managing the large and widely distributed
database of signature keys involved.  Encryption (as a separate issue)
prevents email from being read in transit, and requires a few
thousandths of a second of modern CPU per KB of payload on both ends.
It has similar key database/certification requirements.

  Cons:  I >>already<< can identify mail from friends and people I know
with 99.99% accuracy.  The "From" header from these individuals is
generally completely correct.  I cannot recall receiving a single piece
of spam, ever, with a header forged so that it appears to come from a
friend of mine.  I have received a small but nonzero number of viruses
that way (small because most of my friends use linux and hence are not
susceptible to most current header-forging viruses).  Encryption
obviously doesn't add a sufficient cost to deter spammers, and
elementary arithmetic indicates that it (or associated code-driven
delays) will NEVER add a cost that would deter spammers before it also
deterred all sorts of legitimate uses of mail and cost a fortune in
wasted resources at Internet scale.

Encrypted mail already is available.  People can already sign their mail
digitally if they wish (and many do) and manage keys as best they can.
Neither measure seems likely to impact spam, because spam is sent by
strangers, and strangers are perfectly capable of electronically signing
or encrypting their mail to me - I just won't recognize the signature
because they are strangers.  I'd expect this to have absolutely no
impact on spam at all besides making my internal whitelist whiter (a
difficult concept in a binary decision).

  c) Wait!  Spam comes from strangers, right?  So we'll require all mail
to come from people you know, or people you "consent" to receive mail
from!  And naturally, you won't consent to receive any spam!

  Pros:  Well, hard to argue with this one.  If I only consent to
receive mail from people I know, or mail from strangers that isn't spam,
won't that abate the spam problem?  Kind of a tautology, that...

  Cons:  Yes, and if only the Palestinians and Israelis would lay down
their arms, open all their borders, convert to Zen Buddhism and embrace
one another in a big love-fest, it would abate the problem they have
with killing each other too.  This is a prime example of a proposal
without any sort of internal reality check or CBA.

First off, I consider the abilty to receive mail from strangers an
essential feature of email, not a bug, and will oppose (by ignoring in
any local implementation) any proposal that would remove that feature.
A cure clearly worse than the disease.  I assume all of you agree, after
all NONE of you know me -- I just up and signed up for the list (self
appointed, un-vouched for).  I recognize just one name among the posters
from other lists I'm on (Joel:-).  I could insert a line like "Visit my
website.  Read my books of poetry there.  Send me money." and the most
vigilant of you would have just received spam (or DID you just receive
spam, heh, heh, hard to tell...;-).

Second, any implementation that LEAVES IN this ability has an associated
problem in pure logic.  A stranger sends me mail.  Do I consent to
receive it?  If no, then I'm rejecting mail from strangers (cure worse
than disease).  If yes, then I can be spammed if the stranger happens to
be sending me spam (as my stealth spam above clearly proves).  End of
story.  Note that the fact that I don't "consent to receive spam" is
irrelevant until somebody invents a psychic 100% efficient spam filter,
and if we had one of those we wouldn't be having this discussion.

Note that one of many reasons this is a useless proposal is that email
identities are cheap, easily changed, readily available from multiple
sources, and in constant churn for legitimate reasons (e.g.  changing
ISPs, getting a new account, seeking anonymity).  If they weren't, we
wouldn't NEED the proposal as blacklisting individuals would suffice.
Blacklisting me as rgb(_at_)phy(_dot_)duke(_dot_)edu is easy because I'm NOT a 
(real)
spammer -- I've had the same email address for well over a decade now.
However, if I clicked over to yahoo...I could probably spam the list
once a day "forever" in spite of your best efforts to stop me.

  d) OK, so we can't control individuals as there are order of a billion
of them and millions of them change identities on a given day.  What
about controlling networks?  Only accept mail from "clean" networks.

  Pros:  "Consent" applied at the network level (via white and
blacklists) is already in fairly common use.  I personally believe that
tightening up the regulation of networks might well "help" abate the
spam nuisance.  Not a magic bullet, but improving AUPs and enforcement
of same and clearly requiring SPs to police their networks in specific
ways might, actually, help.  This is very much a matter for open debate,
however, as one has to show in any proposal how it will both alter and
improve what we already achieve here.

  Cons:  Consent at the network level IS in common use, and it may be
that we've already gotten all the benefit it can yield.  There is a time
lag problem here as well -- blacklists are often trying to "catch up"
with the rapidly changing spammer identities.  Network identities aren't
a lot more expensive or difficult to change than individual identities,
and some superlarge domains (e.g. yahoo, hotmail) are effectively
impossible to blacklist (much as I'm sure many of us have been tempted
to do so:-) because there are too many friendly strangers mixed in with
the evil spammers that abuse their services.

There are also complex legal issues to resolve.  AUPs tend to be actual
contracts and have to be dickered out by lawyers.  Enforcment is not
cheap, which is why many providers throw up their hands and refuse to
deal with the problem or blame somebody else.  Some SPs may have a
vested interest in NOT controlling the problem, as they profit
(indirectly) from spammers working through their domains.  Still, this
DOES seem to me at least to be a place where the IETF might make some
small contribution, perhaps by working out a clean partitioning of the
responsibility that everybody seems to want to avoid and getting it
written into future AUPs from the top down, possibly by integrating this
process with e).

  e) Ah, so the networks are avoiding the responsibility of dealing with
they spammers advertently or inadvertently provide network access to.
How about if we write some laws and regulations REQUIRING them to deal
with the problem with fines and other penalties for noncompliance?
While we're at it, how about if we whack spammers upside the head with
all sorts of laws and penalties?  THERE'S a way of adding real costs to
those that profit from spam.

  Pros:  In my opinion, this is almost certain to prove the most
effective way to abate spam in the long run.  One proven model is the
national DNC list, which has been near-miraculous in its effectiveness
in abating phone spam (even MORE expensive, recall, than paper mail,
although the $1-2/message cost didn't deter phone spammers from sending
two to ten messages a day to my household, see a) above).  This adds a
very real "cost" to spamming -- fines and the risk of jail time -- and
forces spammers into the same operational zone as virus hackers.  It
permits the actual pursuit of spammers through networking barriers.  It
adds costs to SPs that "enable" spam (or fail to police it) as well.

  Cons:  Spam is as international as the Internet itself, making
enforcement much more difficult than it first appears.  Laws are great
but have to be enforced, which requires individuals to complain, police
to act, DAs to act, juries to act.  Action takes time and is a
significant cost: legal measures are slow and can be very costly.
Finally and perhaps most important, regulatory laws ALSO reduce the
freedom of the Internet itself and may well prove to be a cure worse
than the disease in the long run.  Nibble away on freedom of speech
here, and somebody seeking control over public discourse will work out a
way of taking a bite out there.

This approach seems to be gradually moving forward of its own accord,
driven by considerable public dissatisfaction with spam.  Prosecutions
are occuring.  Little attention has been paid as yet to SP
responsibilities and liability, but it may well be that a few successful
prosecutions of spammers and/or enabling service providers provides the
"spark" that brings about self-regulation by the rest of the SPs if only
to keep the nose of the very smelly government regulatory camel out of
their tents.  "Clean up your act or we'll clean it up for you" may prove
to be the deciding argument in this particular dialogue.

  f) Filters.

  Pros:  In near-universal use already, at least in some venues.  Can be
very, very effective in abating spam (easily 90% and up).  This is
literally as close as one can come to an automated implementation of a
real "consent" model (which requires examination of content one way or
another, regardless of whether or not a document is signed, encrypted,
certified).

  Cons:  Its a war.  There is an absolutely unavoidable (mathematically
grounded) conflict between effectiveness (percent of false negatives,
spam that gets through) and undesireable side effects (percent of false
positives, real mail that gets rejected as spam). Spammers try to fool
the filter and filter writers try to catch the spam.  Incorporating the
filter into the MTA to ameliorate the false positive problem has
advantages (the hope that the message sender learns that the message
didn't get through and can resend or send an out of band message to open
a hole for the filtered message) and disadvantages (the user can filter
some more in their MUA, the spammer can literally probe the MTA filter
looking for holes in the filter algorithm, additional load on mail
servers).  Filtering is all about cost-benefit at many levels.

This is a very robust and dynamic solution, and is unlikely to go away
unless/until things like legal measures and improved AUPs ameliorate the
problem (if they ever do).  It can be implemented by individuals at the
user level.  It can be implemented by sysadmins at the domain level.
Filters and other intelligent agents COULD be implemented by SPs at the
transport level to identify clients that are spamming, and if it ever
WERE implemented at this level and the SPs came down on AUP violators
like a ton of bricks with contractual monetary penalties, the spam
problem might really significantly abate.  I believe it was Vernon
(again) who pointed out that if tier 1 and 2 providers really wanted to
regulate the problem and were willing to "own" it they could do it now.




Most of the detailed solutions thus far proposed fit into one or more of
the categories above.  Some require universal registration of one sort
or another of individuals or additional registration components for
networks.  Some require certification authorities.  Some require
integrated costing agents or challenge/response systems to be inserted
into (I suppose) MTAs.  Little discussion of costs on the part of the
proposers, often wildly optimistic benefit claims.

In summary, in my opinion, for whatever it might be worth (quite
possibly nothing:-):

  a) A silly solution.  There isn't any reason to believe that adding
scaled costs to spammers will deter spam (cost doesn't deter advertising
anywhere else in our lives at costs up to $1/message, so why should it
deter spam at costs less than this).  Adding costs WILL deter us from
using mail at a far lower threshold than it deters spammers.  At
$0.10/message, I'd be spending $3-5/day on mail, depending on how lists
are billed.  If lists are billed per message, I'd be spending hundreds
of dollars a day at $0.01/message, as I'm on some pretty big lists.
Costs in time (puzzle solving) are just as bad and discriminate against
children and stupid people and handicapped people and list
administrators.

  b) A silly solution to continue to promote AS A SOLUTION TO SPAM and
IN PROTOCOL as part of e.g. a new MTA.  That is, by all means sign
messages if you like.  Create new signature aware mail clients or graft
the capability in to old ones.  Use encrypted mail transport.  Create
distributed signature databases and tools. Just don't suggest that it
will have any impact on spam and don't "force" me to use them, make them
so attractive that I WANT to use them, entirely separate from the spam
issue.  After all, I can't see why this would abate spam in the
slightest on purely logical grounds.  I have no more of a way of
assessing mail from a stranger that signed it than I do of assessing
mail from a stranger that didn't sign it -- either way I have to open
the envelope and filter it and/or look at it.

  c) Don't make me laugh.  I don't consent to spam NOW, and insist on
being able to receive mail from real strangers without the slightest
hint of prior introduction or a priori common ground.  I just cannot
tell what is spam and what isn't without looking, and given order of a
billion (several hundred million and growing) strangers and corporate
entities with access to mail and given that I can set up a dozen
effectively "anonymous" mail accounts in the time it has taken me to
write this response I'll NEVER be able to give consent on a person by
person basis without opening the envelope.  If I have to open the
envelope, this adds NOTHING to the anti-spam filtering measures already
represented in f).

Note that a) through c) I think are so ill-conceived that I (at the
moment) cannot see how they could ever be made into proposals worth
taking seriously.  They all fail a cost-benefit analysis (high cost up
front, likely to have little or even no discernable effect on spam).  So
sure, keep bringing them up and I (and others) will keep pointing this
out.  This isn't "being negative".  It is "not wasting time and money on
a complex measure that, in the end, won't have any useful effect on
spam".

Remember, I think that there may not BE a solution to spam, in the sense
that people seem to be looking for.  There aren't any perpetual motion
machines either.  These two problems may well be linked (information
theory is a common foundation).  So to me, saying that a "solution" that
is obviously flawed is obviously flawed is a CONSTRUCTIVE thing.

On to the good news.

  d) seems worth pursuing.  At least in the sense that AUPs are one of
the effective impediments to spam NOW -- one of the things that largely
keeps it from originating within academic networks, for example -- and
thus there is a real possibility that a better schema for spam
regulation at the network level could result in a significant, enduring,
abatement of spam. Here I think that the IETF could be a very positive
force and provide real guidance (in the form of specifications for
software that might be used by SPs to detect patterns of abuse
originating within their boundaries, for example, and possibly with
legal work on contract templates that would permit them to impose
financial penalties).  Note also that "could" does not mean "will" --
this is a reasonable place to TRY to find a better solution, but one may
not exist or may be too costly or infeasible for other reasons.

  e) Time will tell, but again worth pursuing.  Largely outside of the
IETF's purview, though, except as a sort of "amicus curiae" and insofar
as it integrates with work done on d).

  f) The ONLY solution (aside from AUPs in widespread use against spam),
and in many cases a very effective solution.  It has the advantage of
being evolutionary (responsive to changes in the strategy of spammers
and a changing AUP and legal landscape) and of requiring no higher-level
consent or approval (from e.g. the IETF) to implement on any level from
the individual to the domain.  A variety of free filtering agents are
available as are a variety of commercial filtering agents, and there is
a healthy degree of competition and market choice.  I don't see the
preemminance of filtering as an anti-spam measure changing, and don't
REALLY see much of a role for the IETF in its continuing evolution.

Obviously this is bad news, possibly even unacceptably bad news, to
folks disenchanted with filtering as anti-spam measures.  Bad news or
not, it is reality at the moment and quite possibly really is the best
we can do short of legislation or better enforcement at the AUP/SP
level.

Even things such as signatures, white/blacklisting of networks or
individuals, inclusion of various tokens (solutions to puzzles, e.g.)
are likely to become just another component, and not necessarily a very
powerful component, in the multidimensional decisioning process of such
a filter.  Once the mail has left the point of origination, ONLY a
filter (including the human brain used as a filter) can examine content,
and ONLY content determines whether a message, however else you might
flag it, authenticate it, sign it, white or black list it, vouch for it,
is or isn't spam.  AT the point of origination this is very nearly true
as well, but there the controlling agents are SPs or the government as
they are far away from you personally in space and time.

Perhaps statements "urging" the integration of suitable filters with the
MTA where this is possible are reasonable, I don't know -- arguments
presented here were moderately persuasive that while this wouldn't
affect their effectiveness against spam, the improved treatment of false
positives would permit a lowering of the threshold that identifies a
piece of mail as spam and favorably alter the false negative/false
positive numbers.  This could be so, although I'd worry a bit about
exposing the filter so directly to the spammers to be probed for
weaknesses unless this were accompanied by other measures that might
limit such probing (such as "immediate" exposure to detection followed
by rejection from the network and/or prosecution).

In conclusion, I have seen almost nothing in the entire spam abatement
discussion that can be taken seriously BECAUSE the proposals do not
include the steps described by Vernon.  It would be lovely if future
proposals did, in fact, list BOTH the pros (what one hopes to gain from
the proposal to abate spam) AND the cons (the "obvious problems").  It
would be lovely if they were "numerate" (actually used numbers, backed
up by observation, measurement, real data) where relevant.  It would be
smashing if they had a cost-benefit analysis where the benefits (real or
imagined) of the pros were contrasted with the costs of implementation
and of dealing with the cons.  It would be nice if the numerate part of
the cost-benefit analysis included the scaling issues -- for example,
the costs of imposing a user-level solution on a half-billion or so mail
users using a dazzling array of platforms and operating systems as
opposed to the costs YOU experience hacking something together for
yourself on top of an open source operating system where the solution
comes to you prebuilt for free.  To add a requirement to Vernon's list,
it might also be nice to see what in any proposal is actually new --
we've seen at least three or four notes proposing "solutions" that are
not only not magic bullets, they are already implemented to little or no
effect on spam.  Finally, it does seem like an analysis of whether or
not a proposed solution would stand the test of time is in order.

For example, I can go to my reject/spam folder du jour and -- perhaps --
find some key phrase that is present in all spam.  I can then put that
phrase into my filter definition and say "Voila!  I have solved the spam
problem!"  And for a day, that might even be true, and if I kept it a
dark secret it might last a month or more.  However, can you imagine me
proposing this phrase as THE solution for EVERYBODY on this list?  Just
publishing it here would ensure that it wouldn't last the day.  For a
solution to be realistic, one has to be able to at least argue
persuasively that the first hacker who comes along won't be able to
route around your "impervious" obstacle, that your obstacle will have a
real effect on a whole class of spam that cannot easily be overcome by
someone with access to the same data that you have.

   rgb

-- 
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     
email:rgb(_at_)phy(_dot_)duke(_dot_)edu