Re: proposal for built-in spam burden & email privacy protection

This should be particularly interesting for John Klensin, as it is really
a follow-on to discussion last summer on the topic of information theory
and spam.

On Fri, 13 Feb 2004, Ed Gerck wrote:

Err, I think that allows you to correct _errors_ in transmission.


Shannon distinguished messages --or intended information-- from noise.  
The distinction beween noise and information is that information is what
the sender wants to send or, alternatively, what the receiver wants to
receive. If the channel is a covert channel, it is fair to assume that
either the sender or the receiver (or both) do not intend it to exist
(otherwise, it would not be a covert channel). Thus, a covert channel
transmits information that can be considered for modeling purposes as a
source of noise. Since the 10th theorem applies to any source of noise,
it also applies here.


You are confusing a covert channel with noise. They aren't the same.  
Antispammers have commonly used an analogy that equates spam with noise
and anti-spam efforts as trying to find a "noise filter".  The analogy
sounds good, but is not accurate, which might suggest a reason why they
have failed to find a "filter".

Spam isn't unwanted until after the fact: You read it, and then you don't
like it.  Some people think AC/DC (or rock music in general) is "noise",
or commercials are "noise", but they aren't noise from the point of view
of information theory, and your real or theoretical radio won't be able
automatically filter them out when the DJ plays them, though it will do a
good job of filtering out noise.

You have to hear/read/see it, decide you don't like it, and complain about
it.  But you can't build a stereo that won't play rock music or won't play
commercials.  Perhaps, like the Janet Jackson Breast Baring (JJBB), the
station operator will agree with your complaint, and a fuss will be made
that won't cost you anthing, except taxes.  Spam is essentially the same
problem as preventing people from exposing themselves on TV. In a sense,
the problem is similar to NP-Complete problems because there are a whole
class of problems that are basically the same, and if you could solve one,
you could solve them all.

Getting back to information theory, Spam is information that the system
operator doesn't intend to be sent. It breaks the rules which say not to
send this kind of information. Spam (per the definition of a covert
channel) is simply a "communication not authorized by the security model".  
I should probably explain that spam is a multiplexed channel and identify
all the necessary parts relevant to information theory, but I think that
is fairly obvious, and unnecessary unless you dispute that spam is a
channel per the definitions of information theory. People have challenged
this, but infrequently. It seems you don't, so I'll skip it.

The definition of "covert channel" and the applicability of information
theory to spam was discussed on another (osf_alums) list last fall, and
some "very smart people" had some things to say about the definition of
covert channels in particular, which as I mentioned previously is a term
specificly used by OS scientists.  Some people thought it inappropriate to
use the term "covert channel" outside the context of operating system
analysis.  There are other terms used in other types of research, but they
all are rooted in information theory, and have the same very general
implications as a result:

On Fri, 8 Aug 2003, Ellis Cohen wrote:

My 1976 paper on Strong Dependency (abridged in SOSP 5 in 1977) was (as
far as I know) the first paper that applied classical information theory
to problems of computer security (see
http://citeseer.nj.nec.com/context/114449/0) and it contains the term
"covert information path" (but I can't find "covert channel" used).  I
might possibly even have been the first to use "covert" in the context
of computer security, but I don't think so.  It's even possible I heard
Lampson use it (or maybe it was the other way around ...)

Leo Rotenberg's excellent MIT thesis, "Making Computers Keep Secrets"
was published  in Feb 1974.  He talks about "information leakage",   
"sneaky signalling", "hidden data flows", and talks specifically about
how communication channels (in particular, accounting channels -- that
is, commuication channels used for billing) can be used to spy on data,
but I can't find the word "covert" used anywhere.

So it's possible that Lampson coined "covert channel" in 1978, but I
suspect he might have used it earlier in his paper "A Note On the   
Confinerment Problem" published in the CACM in 1973.

Citations seem to indicate that Lampson's deifnition of a covert channel
was a channel "not intended for information transfer".  And I beleve I  
remember that covert channels were used at that time to cover cases in  
which there was no cooperation.  For example, suppose a spy can observe 
the CPU usage of all programs on a computer which contains a top secret 
program which is only run when an important secret event occurs.  Even  
though there is no cooperation (just sloppy security), a covert channel 
(access to the CPU usages) provides theinformation about when an
important secret event occurs.

  -- Ellis Cohen


On Fri, 8 Aug 2003, Stavros Macrakis wrote:

"Covert channel" has more than one definition.  In Lampson's original
article, it referred to channels not intended for information transfer
at all, and thus did NOT include information leaks via the billing
system or via steganography.  If I read the NCSC's definition (below)
correctly, it refers to ANY communication not authorized by the security
model (which seems overly broad to me, but...).

I suspect [Lampson] might have used it earlier in his paper
"A Note On the Confinerment Problem" published in the CACM in 1973.


A Note on the Confinement Problem
Butler W. Lampson
Xerox Palo Alto Research Center
Comm. ACM 16, 10 (Oct. 1973)

http://citeseer.nj.nec.com/lampson73note.html

Even when all unauthorized access has been prevented,... [a service] may
leak, i.e. transmit to its owner the input data which the customer gives
it.
...
The channels [in his examples] fall into three categories:

* Storage of various kinds maintained by the supervisor which can be
written by the service and read by an unconfined program, either shortly
after it is written or at some later time.

* Legitimate channels used by the confined service, such as the bill.

* Covert channels, i.e. those not intended for information transfer at
all, such as the service program's effect on the system load.


NATIONAL COMPUTER SECURITY CENTER
A GUIDE TO UNDERSTANDING COVERT CHANNEL ANALYSIS OF TRUSTED SYSTEMS
November 1993
http://www.radium.ncsc.mil/tpep/library/rainbow/NCSC-TG-030.html


The only solution so far is "detect and correct". Which, as noted, isn't a
very good solution, and we desire a solution that didn't have this
characteristic.

Actually, I described a detection AND a correction mechanism. The 
correction mechanism uses a correction channel as given by the 10th
theorem. BTW, just sampling 1% of mail might be enough to prevent 
misuse to almost 100% confidence.


It is still a "whack-a-mole" or "detect and correct" problem. It is not
impossible for someone to conduct abuse.  What sampling level you need to
have to assure a certain degree of compliance depends greatly on the
deterrence of the correction method.

Actually, genuine spam is not outlawed.


It depends how you define spam. Genuine "spammers" would quibble
with you calling them spammers. I'd call them email senders.

Only the spam sent by people who
are not genuine businesses is outlawed.


Not true. If a genuine business continues to send me messages
after I unsub, it is spam. The classification of spam is not
based on who sends the message.


This is a quibble, or you haven't read the CAN-SPAM act.

I expect that this abuse is sent
by a very small group of people.  Prosecuting this small group should be
relatively easy.


It has not been and it will only get worse.


Until last month, there were no tools to prosecute them, except for virus 
infection. And that isn't taken seriously unless there is fear that the 
virus caused a major power failure. But in that case, the virus sender is 
caught quickly and painlessly.

Also, users should not have to sue spammers, or have any other burden,
in order to protect the users' resources. Imagine if I would have to
manage 300 lawsuits a day (the average spam rate that my system cannot
automatically detect as spam)?


This is an exaggeration. There aren't 300 unique spammers per internet
user per day.


Agreed  -- it's an understatement. I believe there are perhaps
1000x 300 unique spammers per day.


300,000 unique spammers each day, and only 6 billion people in the world.  
Gee only 20000 days until every one on the planet is a spammer. Yet
everyone on the planet doesn't even have email, telephone, or even TV. I
wonder if there will be anyone without these things in 50 years.  I heard
not too long ago that there are _only_ around 100 million on the net.  
Maybe its now up to 200 or 300 million.  So in 3 years, everyone that's on
the net will be a spammer.  Gee.  Maybe we better get moving on the IPv6.  
;-)