Wednesday, Dec 2, 2015 9:49 AM John Levine wrote:
Different people are different and it is not helpful to pretend that
all end users are the same. Most people say they care about privacy,
but their actions show that they actually don't, e.g., they'll trade
their password and SSN for a candy bar.
Show your data or please stop making generalizations like this. This is
really not helpful.
Some people really do care about privacy. I don't know if you've ever
talked to someone who runs a battered women's shelter, but I have.
For them, their privacy is really a matter of life and death, and they
have to deal with impressively complex threats. I've heard direct
reports of malware that installs keyloggers that report back to the
hostile spouse. These people boot their computers from a CD to use
webmail through Tor, and buy burner phones in bulk. The kind of stuff
we're talking about redacting here is completely irrelevant to them,
since as I said, they are not so dim as to depend on their mail
provider's logging practices for their safety.
So what you are saying is that there are two kinds of caring about privacy: not
at all, or extremely. I'm sure these stereotypes are based in real personal
experience, but anecdotes are not data. This is not what actual research on
this topic appears to show. We tend not to remember reasonable people in
ordinary situations: we remember complete idiots, because it's
funny/disturbing, and we remember people in trouble. Basing policy decisions
on personal recollections tends to get things badly wrong.
Christian's point about bulk collection is a reasonable one, but just
as the collection affects a lot of people, the security benefits from
good header logging affect a lot of people, too. We need to start by
understanding how they're really used and what the benefits are.
I agree that we need to understand this. I've been asking people who say they
are in the know if they could share some data with us, and since I asked
yesterday, it's unreasonable to think that someone would already have answered.
Hopefully we will get some data.
From what we've heard here from people who run significant mail
systems for real users, the benefits are substantial.
We've heard unsubstantiated assertions to this effect, not accompanied by any
The reason I'm skeptical about this is twofold. First, the one example someone
presented that seemed to support the case for including IP address identifying
information from the mail submission server actually looks to me like it makes
things worse, not better, at least in the presence of a competently operated
mail submit server.
Second, I'm pretty sure that if you are filtering spam, and then you add a
heuristic that pays attention to the initial sender's source address, you will
see some increase in messages identified as spam. However, if you saw that
effect five years ago when you installed that heuristic, and your filtering
software has gotten a lot more sophisticated in its reliance on ML since then,
you might still believe that the heuristic is making a big difference, when
it's really making a small difference.
And you might never have instrumented it in such a way that you could discover
the present truth of the matter. And that's precisely the perfectly
reasonable mindset from which claims that "it makes a big difference" would
come without any data at all to back them up.
Sent from Whiteout Mail - https://whiteout.io
My PGP key: https://keys.whiteout.io/mellon(_at_)fugue(_dot_)com
Description: PGP signature
ietf-smtp mailing list