After this issue, I am probably moving the thread to IRTF (as suggested) if
possible (but probably after taking a break to do some other work).
Information theory says that such things are impossible. One can not
construct a spam-free protocol because this is the same problem as
constructing a system free of covert channels, which information theory
says is impossible.
But information theory also says you can optimize signal-to-noise ratio,
but only if you know what the characteristics of your signal are.
It actually doesn't say that precisely. It says that you can transmit a
signal with an arbitrarilly low error rate at a speed below the channel
capacity.
The concrete task of altering the signal to noise ratio is accomplished by
enhancing the signal with a harmonic oscillator, so that it is stronger
than the noise.
Agreed.
And thus on a "conceptual level", you have to have some idea about the signal
characteristics in order to enhance it.
Actually if I remember correctly, your example is how it applies to periodic
signals. The general case is more abstract.
This is then described as a set of differential equations
that can be optimized with Variational methods. The limits of this
process are indicated by information theory, the nyquist theorem, etc.
Add Shannon entropy, chaos, etc...
If the channel isn't described by a fourier series, then the differential
equations may not be solvable, and it may be impossible to optimize its
signal to noise ratio. (Well, there are other mathematical methods, but
you get the point.)
Yes that is what I meant that the general case is more abstract, so I was
talking on a "conceptual" or abstract level.
You are borrowing the concepts by metaphor, but the
concrete methods don't transfer well.
I was only using it to say we must define the signal how it appears in the
channel before we can do any research on it in the channel.
The way spam is currently defined defined as UBE (instead of my proposed *BE),
then it means you can only model the signal at the end point. Given that means
in the receivers subjective mind, that is not all that useful for research,
unless you want to get into very fuzzy science such pyschology. If you want to
make the point about practicality, then that is a very strong one!
My point is not to discourage you from trying to stop spam,
You are only 1 of 3 people so far at IETF who has said that to me. The rest
who have commented have tried to discourage me. So thank you.
but to focus
your attention on detection, rather than protocol alteration. It is
impossible to alter the protocol in any way that will force the spammer to
identify themselves a-priori as a spammer.
Disagree strongly. First benefit is once you define spam == *BE (instead of
UBE), then it is easier to model spam and do research on it, because you can
model it at any node in the channel, not only at the receiver end point. That
was my whole point about "enforcers".
However, there is a problem. Some *BE is solicited. Which is why I proposed
moving the solicited *BE to another channel ("pull").
Your point is that it is futile to define a protocol that will separate the
solicited from the unsolicited, because spammers will always be able to subvert
the protocol. And you to say thus there are no benefits to detection. I
strongly disagree. There are two aspects to my response:
1. Spam coming thru the alternate "pull" channel can be modeled differently
that spam defined as *BE. This separation of models provides benefits over
trying to model spam as UBE in the receiver's mind (end point). Other person
in this thread has provided one specific example, which is the "pull" delay
gives a whole new dynamic to detection. Also I have pointed about that the
membership quality of the solicited channel, gives it unique modeling
advantages.
2. Spam coming thru the existing channel can then be modeled as *BE at any node
of the channel, instead of as UBE. Some nodes have a much better model of spam
in this definition, than the one at the end point. For example, ISPs can see a
lot more abuse data in real-time, than a single receiver or the current
inherently more clumsy attempts to group or poll receivers.
Hopefully that will set the record straight that I am thinking about spam in
new conceptual ways...and not rehashing as others have claimed...
You could ask for spammers to cooperatively self-mark their messages.
But this hasn't been terribly productive.
Obviously I am not asking for that or any thing like that. See above.
It is also pointless to ask for
cooperative identification of non-spammers and identify spammers as those
not in the set of non-spammers.
I am also not asking for this, and it is instructive to understand how I am not.
I am only making a definition, so that one can model under the benefits of that
definition. What people actually do is a different matter, but as I pointed
out previously in this thread, once you model spam the way I have proposed,
then solicited *BE will have a distinct advantage to adopt the model. And as I
point out above, it doesn't matter what spammers do, because the improved model
is helpful for advancing detection in both cases.
And my other point has been that when a channel gets so saturated with noise
that you can not longer find the original signal reliably (as you say above the
S/N ratio will depend on Nyquist, which is a very crucial point), then
solicited *BE and receivers are going to need a different model, else
information transmission will no longer occur reliably.
So given a set of unmarked messages, some spam, some not-spam, the task is
to have a program mark them in the same way that a human would if a human
were reading the messages. Since humans have different definitions of
spam, it would be useful if the program could accept different definitions
as well. This is the realm of content analysis.
You see this is the crux of the whole stagnation of anti-spam in my view.
Content has nothing to do with what makes spam annoying. It is the S/N factor,
i.e. that it only gets a 0.005% response rate.
I am trying to shift the whole paradigm from thinking about psychology (will
always be fuzzy result), to thinking and modeling the noise factor.
It is a profound paradigm shift that gets you closer to a more robust solution
for detection.
Thus my whole motivation for an unambiguous definition (spam == all bulk
email) along the channel and not just a definition at the end points
(UBE).
You may need a precise definition before you can begin implementation
(just like you need a definition of voltage, current, etc to begin
building a transmitter),
Exactly. You need a definition before you can model.
but you do not need a precise definition to talk
about the theoretical aspects.
Yes you do.
Spam could be defined as UCE, CE, UBE, or
BE. I have also a more complete and detailed taxonomy of spam:
Those are all definitions.
There are 3 types of email that we generally call spam:
This is going down into the psychology line of model, which I am trying to
paradigm shift away from, because it is not very well correlated to what makes
spam a problem. If spam had a 5% response rate, it would no longer be a
problem. Modeling the psychology is something other people are working on
already.
[snip]
Thanks,
Shelby Moore
http://AntiViotic.com