Re: Proposed Statement on "HTTPS everywhere for the IETF"

On Jun 5, 2015, at 2:05 AM, Mark Nottingham <mnot(_at_)mnot(_dot_)net> wrote:

Hi Roy,

My overall concern here is that statements like this undermine the 
integrity of the organization. I understand people wanting to improve 
overall privacy, but this step does not do that in any meaningful way.

Encrypting the channel does provide some small amount of privacy for the 
*request*, which is not public information.  Browser capabilities, cookies, 
etc. benefit from not being easily-correlated with other information.


That is message confidentiality, not privacy.  Almost all of the privacy 
bits (as in, which person is doing what and where) are revealed outside of 
the message.


There's been a lot of historic confusion (or maybe it's just different jargon 
for different communities) about confidentiality and privacy in protocols; 
I'm assuming Joe meant, roughly "…provide a small amount of privacy by making 
the request confidential…"


I assumed so too, but there is a huge difference between privacy and just hiding
some of the header field content and query data in each request.
The problem with this campaign is that it is using the term "privacy" as if 
HTTPS
provides it, but it does nothing of the sort.

For most people, we let such confusion slide because they aren't expected to
understand how the protocols work as a system.  The IESG should at least 
understand
the protocols if it is going to be conducting campaigns in our name.

It would be interesting to define an HTTP header of "Padding" into which 
the client would put some random noise to pad the request to a well-known 
size, in order to make traffic analysis of the request slightly more 
difficult.  This is the sort of thing that comes up when we talk about 
doing more encryption for the IETF's data, which shows the IESG's suggested 
approach to be completely rational.



HTTP/2 has padding built into the relevant frames. In HTTP/1.x, padding is 
sometimes done with (unregistered) headers, but more often is done with 
chunk-extensions. Don't think anything needs to be registered here.

Browsers don't send singular messages containing anonymous information.  
They send a complex
sequence of messages to multiple parties with an interaction pattern and 
communication state.
The more complex and encrypted the communication, the more uncommon state 
and direct
communication is required, which makes it easier to track a person across 
multiple requests
until the user's identity is revealed.


+1. I'm very interested to see the research that showed this so clearly for 
HTTP/1.x over TLS repeated for HTTP/2, since it has multiplexing and usually 
uses a single connection per origin. I suspect that it's better, but 
certainly not proof against these kids of attacks.

Furthermore, with TLS in place, it becomes easy and commonplace to send 
stored authentication credentials in those requests, without visibility, and 
without the ability to easily reset those credentials (unlike in-the-clear 
cookies).


Yes. This is a concern that I talked through with Balachander Krishnamurthy 
(who said his cookie research would have been much more difficult with 
pervasive HTTPS) and others when SPDY came around. I think we need much 
better tooling here. There has been a bit of progress, but it's been very 
slow...


I don't think you appreciate the impact of authenticated requests on the 
overall system.
It isn't just that the sites you intend to visit now have the ability to 
uniquely identify
you at no additional infrastructure cost.  It is that every https reference on 
every page
has the same ability, and is no longer hindered by limitations on Referer or 
"privacy"
concerns (again, because people like the IETF claim that encrypted data sent 
over TLS is
private even when we have no control over the CAs, the recipient, and the data 
sent).

Padding has very little effect.  It isn't just the message sizes that change 
-- it is all of the behavior that changes, and all of the references to that 
behavior in subsequent requests, and the effects of those changes on both 
the server and the client.


Padding may not be sufficient to be proof against information leakage, but it 
is sometimes necessary. It may have little effect in the scenarios you're 
thinking of, but it's still useful against some attacks.

TLS does not provide privacy.


No protocol "provides" privacy in the sense you're talking about. TLS helps 
to maintain privacy in certain scenarios.


Yes, but not the scenario described by an Internet retrieval of an "https" 
schemed resource
identifying public information that does not require user authentication or 
persistent
cookies to GET.  That is the added scope of what people mean by 
HTTPS-everywhere, since HTTPS
itself is not a named protocol and we already recommend whatever is 
HTTPS-obvious.

Given the news over the last two years (to almost the day!) and the nature of 
the attacks we're talking about (where your access to public information can 
be strung together to learn many things about you) it's not surprising that 
it's being discussed.


Yes, but again -- using a significant event like Snowden's release of 
information about
mass surveillance to justify HTTPS-everywhere presumes that HTTPS-everywhere is 
an actual
defense against mass surveillance, or at least enough of an improvement to 
justify its cost.
While confidentiality is necessary in many cases, and more than justified by 
those cases,
it is not necessary in all cases.  Furthermore, a user's privacy can be reduced 
by insisting
that HTTPS be used in all cases, because "https" hides what each page decides 
to send over
the connection, increases the amount of metadata pointing directly at the user 
agent, and
extends the duration of exchanges.

Encryption works. That does NOT mean that performing Web retrievals using TLS 
hides the
information necessary to track exactly who you are, what you are doing, and how 
long you
are doing it.  It can hide other things: things that have been considered 
important to hide long
before mass surveillance became a rallying cry (as odd as that sounds).  Nor 
does it mean
that, when encryption is useful, TLS is the right protocol to apply it.

Avoiding mass surveillance is a lot harder.  It requires specialized behavior 
by the
user agent, not just encrypting communication.  It requires better protocols for
name services, routing, and avoidance of long-lived connections.  These are 
also within
the scope of the IETF.  But what we are being told, instead, is that "https" 
will somehow
address the problem if we all click our heels together at the same time.  It's 
a disgrace.

The problem isn't that we lack the ability to combat mass surveillance.

Using more TLS to achieve confidentiality *will* result in more privacy from 
a pervasive network attacker — it just won't help against an attacker (even 
with the best of intentions or the dodgiest of business models) at the other 
end of that connection (which I absolutely agree that the IETF and W3C should 
be thinking about as well).


A pervasive network attacker is at both ends of the connection, and behind the 
connection,
and watching state before and after the connection.  Pervasive is pervasive.

What it does is disable anonymous access to ensure authority.


Please explain?


The https scheme relies on the notion of authority in the URI combined with 
direct or
tunneled connection to that authority to establish a trusted exchange of 
information
between the user and that authority (assuming that the user trusts that 
authority).
For various performance reasons, a great deal of state is held on the user 
agent to
ensure that its next connection to the same authority isn't depressingly slow.
Recipients are discouraged from shared caching or mirroring of the content, 
since
the authority is vested only in the connection that delivered it, not in the 
bits
that were delivered, and the user agent doesn't know why the bits were secured.

Anonymous access, in contrast, does not presume that the user trusts the 
authority.
Very little state is maintained on the user agent, since it doesn't actually 
help.
Recipients are encouraged to cache or mirror the content, especially if the
content itself is signed, which means other users can access the content without
making a request to the authority.  Information can be replicated and accessed 
at
locations the user does trust, perhaps even offline.

The other advantage that replication has over https, aside from not requiring
a connection to the authority, is that the information cannot be personalized.
If you can go to a public library to see a copy of the tax code, or legal code,
or some other document of public interest, it makes it much harder for that code
to be changed without people noticing, or for certain viewers of the code to see
a different version than others.

It changes access patterns away from decentralized caching to more 
centralized authority control.


I think the combination of how HTTP is defined and Web browsers' specific 
usage patterns of HTTP over TLS does that. We're already seeing some 
background discussion of how to offer caching without sacrificing security.


We can't have a reasonable comparison of the effect of HTTPS-everywhere based on
proposals that are deployed nowhere.  Deploy them first, advocate later.

That is the opposite of privacy.


No, it's the opposite of anonymity. The most relevant definition of privacy 
I've seen was brought up on the Human Rights mailing list a little while back:
 <http://www.internetsociety.org/blog/2013/12/language-privacy>
… and it's much more nuanced than that.


Of course it is more nuanced than that, but I certainly won't be looking at a
definition of "about privacy" to define lack of privacy (they are different 
things).
My point was that forcing people into an interaction pattern involving the
authority of a given set of information, for every bit of information that
person might want to access, does not preserve the user's privacy.

That said, I agree that both forced de-anonymisation and centralisation *can* 
both be privacy-hostile. I don't think it follows that more TLS / HTTPS 
equals less privacy, however.


HTTPS where it isn't needed results in a more centralized system, with less 
privacy
for anyone participating in that system.  This is a frequently repeated pattern 
that
can be observed right now in any of the walled gardens.

TLS is desirable for access to account-based services wherein anonymity is 
not a concern (and usually not even allowed).  TLS is NOT desirable for 
access to public information, except in that it provides an ephemeral form 
of message integrity that is a weak replacement for content integrity.


I think reasonable people can disagree here. When faced with a pervasive 
attacker (whether it be a government or a network provider) who can use your 
access to public information against your will, it *is* desirable.


Sorry, https does not help there.  TCP and state observation is more than 
sufficient.
TLS does help when it is used in a completely different way (securing 
connections to
trusted privacy-filtering and re-routing intermediaries, for example).

As an aside, the World Economic Forum has classified personal data — 
presumably including browsing habits — as a "new asset class." One could 
argue that by browsing without encryption, you're literally giving money to 
anyone on the path who wishes to extract it.


The same is true for browsing with encryption.  Furthermore, if everything is 
encrypted,
then the presence of encryption alone no longer implies special handling of the 
data is
warranted.

If the IETF wants to improve privacy, it should work on protocols that 
provide anonymous
access to signed artifacts (authentication of the content, not the 
connection) that is
independent of the user's access mechanism.


Could you expand upon this a bit? I can think of many potential projects 
along these lines (and even have one or two brewing), but I'm not quite sure 
what you're getting at.


See above, or just look at reasonably good systems that are actually
designed to protect privacy (like Tor).

I have no objection to the IESG proposal to provide information *also* via 
https.  It would be better to provide content signatures and encourage 
mirroring, just to be a good example,
but I don't expect eggs to show up before chickens.  However, I agree with 
Tony's assessment: most of the text is nothing more than a pompous political 
statement, much like the sham of "consensus" that was contrived at the 
Vancouver IETF. TLS everywhere is great for large companies with a financial 
stake in Internet centralization. It is even better for those providing 
identity services and TLS-outsourcing via CDNs.


*sigh* I'm always disappointed when people smear others' motivations without 
facts to back it up.


I am disappointed with engineers who think it is appropriate to arrange an array
of joint meetings with the TLS working group wherein hums are conducted to 
contrive
a political statement that is later claimed to represent IETF consensus, as 
opposed
to the repeated consensus of the one working group which happens to develop TLS.

I am also disappointed that, once that self-serving political statement was 
arranged,
it has been used repeatedly to mislead other organizations that are less savvy 
about
how the Internet actually works, but wouldn't dream of opposing "privacy".  Who 
would?

If you are going to conduct a political campaign, I expect to see reasonable and
responsible disclosures of the profit motive even if it has no personal 
relevance
to the person disclosing.  Readers can reach their own conclusions.

It is a fact that https is considerably harder to scale than http hosted 
services,
in terms of CPU, bandwidth, congestion-sensitivity, cache effectiveness, and
longevity of connections.  It is a fact that few companies specifically sell 
such
services to others and would have difficulty not benefiting from more customers.
It is a fact that https services are currently sold at a premium, as compared to
http services, so even existing customers making the switch will inevitably 
result
in increased revenue.  And it is a fact that there are far fewer organizations 
with
sufficient competence to do it right, which will (at least temporarily) create
a competitive advantage.

Likewise, various publications exist that describe efforts to create
persistent advertising identifiers that are not subject to cookie clearing.
Specifically, identifiers tied to services that the user would not want to 
discard.
The problem with such identifiers is that sending them in the clear, even in a
hashed form, would display an obvious user trail.  If everyone is using https,
they don't need to be sent in the clear any more; however, the trail is still
there (and fully within the reach of pervasive surveillance).
The IETF should not call that "privacy".

Furthermore, when search engines switched to https (rightly so) to preserve
the confidentiality of user-provided query data, they lost the ability to
pass that query information as Referer data to downstream non-https sites.
That's no longer a problem in the land of https-everywhere.

E.g., from what I've seen of CDNs (take my personal view as you will), it's 
not the nirvana you paint; scaling TLS is still difficult (thanks to 
lingering lack of support for SNI), and there's a growing expectation in the 
market that HTTPS will cost the same as or only a small increment over 
serving HTTP (despite the consumption of IP addresses that it requires to 
serve a broad set of clients well from a highly distributed set of servers).


I only get to hear the complaints of actual enterprise customers.  YMMV.

It's a shame that the IETF has been abused in this way to promote a campaign 
that will
effectively end anonymous access, under the guise of promoting privacy.


How does HTTPS "end anonymous access"?


Because https-everywhere eliminates anonymous access; not just in the technical
leaks that result from all that authenticating the authority, but also in the 
social
effects it has on the overall ecosystem.  It excludes the features of HTTP
that encouraged shared caching (by default) and removes social and technical
barriers associated with persistently identifying each user.

If we are going to make grand recommendations that change the way the Web works,
we should at least understand the consequences.  If we are going to tell people
that something will improve privacy, then it had better improve privacy to the
same degree that we say it does.

....Roy