RE: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design

Hi Ted,

Please see inline.

Cheers,
Med

De : Ted Hardie [mailto:ted(_dot_)ietf(_at_)gmail(_dot_)com]
Envoyé : jeudi 2 mars 2017 19:02
À : BOUCADAIR Mohamed IMT/OLN
Cc : ietf(_at_)ietf(_dot_)org;
draft-hardie-privsec-metadata-insertion(_at_)ietf(_dot_)org
Objet : Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design
considerations for Metadata Insertion) to Informational RFC

wing are missing from the document:
It's difficult to say how something will be used in the future.
[Med] An advice that is not implementable makes more troubles, IMHO.
Sorry, I thought you were asking what wgs or protocols planned to reference
this. For that, I don't know.
[Med] OK. IMHO lacking such considerations, there is a high risk that the
advice will be lost or that it can be used as a permanent DISCUSS point in
later stages of preparing documents. I’d prefer if actionable points to be
considered by WGs and document authors in early stages.

The intent is that it is information useful to those considering whether
restoring metadata lost to encryption in mid-network is the right way to go.
[Med] This is another assumption in the document that I disagree with: It seems
that you assume that an on-path device, that inserts metadata, is necessarily
RESTORING back that information. This is not true for many efforts:

· A Forward-For header inserted by a proxy does not restore any data;
it does only reveal data that is already present in the packet issued by the
client itself.

· An address sharing device, under for example DS-Lite (RFC6333), that
inserts the source IPv6 prefix in the TCP HOST_ID option (RFC7974) is not
RESTORING any data. The content of that TCP option is already visible in the
packet sent by the host.

· Service Function Chaining WG
(https://datatracker.ietf.org/wg/sfc/about/) is defining an architecture to
communicate metadata by on-path devices; that metadata is inserted at the
network side. Border nodes will make sure that data is stripped before
forwarding packets to the ultimate destinations. The metadata can be a
subscriber-id, a policy-id, etc.

So when draft-hardie-* says: “Do not add metadata to flows at intermediary
devices unless
a positive affirmation of approval for restoration has been received
from the actor whose data will be added.”

(1) Do you assume that the sample examples I listed above fall under your
advice?
(2) How an on-path device will know the data it intends to insert is a
“restoration”?
(3) Does it mean that for new data (i.e., that are not restoration), on-path
devices are free to do whatever they want? For me, this is undesirable. There
is a void there. A statement to require those networks to avoid leaking privacy
information must be included.

Another assumption is made here:

Instead, design the protocol so that the actor can add such metadata
themselves so that it flows end-to-end, rather than requiring the
action of other parties. In addition to improving privacy, this
approach ensures consistent availability between the communicating
parties, no matter what path is taken.

This text claims that providing data by the endpoint ensures a “consistent
availability” of that information. This is broken for a multi-homed host that
uses for example Forward-For header: Obviously, the content of the header if
injected by the endpoint will depend on the path. A way to ensure a “consistent
availability” is to insert many Forward-For headers; each enclosing the content
that is specific to a given network attachment. But doing that raises a privacy
concern because the remote server can track clients.
My intent (and the understanding of other reviewers) is to highlight that these
mechanisms have a privacy-damaging result and that this should be considered.
[Med] I do think existing documents already make that job. I do think we need
more.

Sorry, did you mean "do not think we need more"?
[Med] I meant we need more than only highlighting the issue. We need something
which is actionable. Requiring a Privacy Section in every RFC may a direction
to consider.

If so, I obviously disagree. This design pattern is used uncritically enough
that a brief document describing why it isn't safe still seems to me useful.
Were it incorporated into a more general document (as noted before), that would
also work. If it later is, that more general work could obsolete this (though
that's a bid for an informational document).

In particularly, I'm concerned that some application functions in the network
(e.g. recursive resolvers or proxies) do not consider the postive privacy
implications of their aggregation and so do not consider adding this data back
as problematic.
[Med] I’m also concerned with that, too (see e.g.,
http://www1.icsi.berkeley.edu/~narseo/papers/hotm42-vallinarodriguez.pdf<http://www1.icsi.berkeley.edu/%7Enarseo/papers/hotm42-vallinarodriguez.pdf>).
In the meantime, I’m also concerned with (1) some applications that leak
privacy information without the consent of the user and (2) some application
servers that may correlate various information shared by an application client
to track users (e.g., https://panopticlick.eff.org/). BTW, I see that you are
using “application function” which may not have the same meaning as the general
“protocol” wording used in draft-hardie-*. Do you consider a DHCP relay as an
“application function”?
Highlighting this enables them to see this traffic in a different context.
[Med] Isn’t this already assumed by some protocol designers (e.g., RFC6973,
SIP)? BTW, there are subtleties when proxies are in the same trust domain of
the client or server.
There are certainly some protocol designers that have internalized this, but my
experience has been that this is not always the case. In a fair few cases,
folks deploy methods like this because they see encryption of metadata in data
integrity terms or see aggregation only in terms of data usage minimization.
They restore the metadata mid-network because it is the quickest solution for
them to get back to the status quo ante for their understanding of the system.

[Med] I hear you. What would be the harm if those solutions strip that
information before sending it to the server? If they don’t strip it, this means
that either the information can be parsed and used by the server, or at least
its presence does not lead to session failures. In the case the server parses
and uses that information, this means that the presence of that information is
important for the service to deliver. In that case, the question is why the
client does not supply that information at the first place.
* that data may not be always available to the endhost
Understood, but even in this case, it is better to make the permission to add
the data explicit.
[Med] This may be easy to implement for some applications, but this may not be
generalized to ** all ** protocols.

You are certainly correct that many deployed protocols would find it hard to
retrofit this consent model into their existing flows. This is, however,
advice for folks at the design phase. If RFC 6788 were being written after the
publication of this document, its authors might well have looked at the
protocol mechanics in section 5.2:

The AN

intercepts and then tunnels the received Router Solicitation in a

newly created IPv6 datagram with the Line-Identification Option

(LIO). The AN forms a new IPv6 datagram whose payload is the

received Router Solicitation message as described in
[RFC2473<https://tools.ietf.org/html/rfc2473>],

except that the Hop Limit field of the Router Solicitation message

MUST NOT be decremented.

and asked whether the circuit identifier corresponding to the logical
access loop port of the AN from which the RS was initiated PII. If so, this
document would have them consider whether transparent interception
is the appropriate choice if it is. There clearly are flows in which the AN's
role
would be explicit.

I don't know, frankly, which choice is right in this case, but I would prefer
that
the choice be made with an easy reference to the implications of inserting
metadata
at hand.
Putting aside the interaction with a user to get a consent and how that consent
will need to be changed when another user uses the same device to connect to
the Internet. Consider a user who does not want an upstream DHPC relay to
insert the line-id (https://tools.ietf.org/html/rfc6788) to the server, and
let’s suppose the relay received a signal (by some means, to be yet specified)
that for this particular DHCP client, the line-id must not be inserted. For
this case, connectivity won’t be provided to that user. This would mean extra
calls to the hotline for that network provider. This is not desirable for both
customers and network providers.
I

f this can be done in parallel with other actions, then the latency impact can
be minimized.
[Med] These are assumptions and implications that are worth to be added to the
draft.

Okay, how about the following text being added to section 5.
There also tensions with latency of operation. For example, where the end
system does not initially know the information which would be added by on-path
devices, it must engage the protocol mechanisms to determine it. Determining a
public IP address to include in a locally supplied header might require a STUN
exchange, and the additional latency of this exchange discourages deployment of
host-based solutions. To minimize this latency, engaging those mechanisms may
need to be done in parallel with or in advance of the core protocol exchanges
with which this metadata would be supplied.
[Med] Looks good to me. Thanks.

BTW, this falls into this general discussion in
https://tools.ietf.org/html/rfc6973:

a. Trade-offs. Does the protocol make trade-offs between privacy
and usability, privacy and efficiency, privacy and
implementability, or privacy and other design goals? Describe
the trade-offs and the rationale for the design chosen.
* a misbehaving node may be tempted to spoof the data to be injected. A remote
device that will use that data to enforce policies will be broken.
This point was discussed extensively in the GEOPRIV work and essentially a
single carve-out was made: for emergency services, where falsely asserted
location data could be used to SWAT individuals or consume safety resources.
I don't think that falls into this narrow advice, but I would be willing to add
something like this to the security considerations:
"Note that some emergency service recipients, notably PSAPs (Public Safety
Answering Points) may prefer data provided by a network to data provided by end
system, because an end system could use false data to attack others or consume
resources. While this has the consequence that the data available to the PSAP
is often more coarse than that available to the end system, the risk of false
data being provided involved a risk to the lives of those targeted."
[Med] Thank you. Providing PSAP as an example is OK, but I’d like the issue to
be called out as a generic one while PSAP is provided as an example. What about
the following:

"Note that some servers (e.g., emergency service recipients, notably PSAPs
(Public Safety Answering Points) [RFC6443]) may prefer data provided by a
network to data provided by the end system, because an end system could use
false data to attack others or consume resources. While this has the
consequence that the data available to the server is often more coarse than
that available to the end system, the risk of false data being provided
involved a risk to the lives of those targeted."

I don't think that emergency service recipients shifting to an example works
here, because it broadens the carve out. In the emergency services case, the
resources consumed are fire trucks, ambulances, and swat teams. For other
servers, resources consumed could simply be CPU cycles or disk; that's really
not the same. Balancing location consent requirements against one was agreed;
balancing it against the other was not.

[Med] Resources may not be restricted to CPU or disk but may be granting access
to the service (e.g., download a file when a quota per source address is
enforced). It can be whatever the servers consider to be critical for them; it
is up to the taste of the service design to characterize it. The NEW wording
proposed above is technically correct. Please reconsider adding it to the draft.

* it was reported in the past that some browsers leak the MSISDN and other
sensitive data.
This is true, but it seems to me unrelated to the point of the document.
[Med] It is related because blindly trusting an application client (and server)
has its own privacy risks. This is even exacerbated given the rich data that is
available to an application client and also because of the visibility on
various layers available to an application server.

I agree that it has its own privacy risks, but I don't think this is the
document that should explore them.
[Med] You don’t need to explore them, but to add one or two sentences to remind
that privacy leaks are still a valid concern even if only clients are supplying
data without the help of an on-path network device.
From that flow some of your other concerns about audience, at least as I
understand. As written, this is narrow advice for a broad audience: basically,
anyone who would consider the form of metadata insertion it describes. You
would, if I understand you, prefer a narrower description of the audience in a
larger context.

[Med] The key point here is about the practicality of implementing the advice
NOT changing the scope. For example, the document says that it is better that a
host is injecting the data but the document does not question whether that
supplied data can be trusted or not,

Broadening this a bit, you're looking at two cases: one in which the data the
host has is wrong and one in which there is an adversarial relationship. For
the first case, we can add text saying that when an end system supplies data it
is the end system's responsibility to ensure that it is correct; don't use a
STUN result from last week as fresh, for example.
[Med] OK.

For the second case, in which the server treats user supplied data as
potentially misleading because the user may wish to circumvent restrictions,
I'll point out the Wikimedia example demonstrates that simply shifting the
trust to a mid-point entity doesn't work; it has to be shifted to an entity
within the trust domain of the server. So the question isn't really "end-user
system supplied data can be trusted or not", the same question applies to
whomever supplies the data.
[Med] Fully agree. Having some text to record that the concern applies,
including for client supplied data.

or how the consent will be obtained from a user.

You're right that I'm leaving aside the question of how the user sets the
policies, because it may vary by protocol and type of device too much to make
general advice useful. If you would like me to add an explicit statement to
that effect, I am happy to note that it is not covered.
[Med] Please add some text about this point. Thank you.

In general, the point of the document is that the host should be able to omit
the data without mid-network devices adding it back. That's the point of
protecting the traffic in the first place, after all. I am saying that if the
protocols require the data, then getting it from the end host has better
privacy properties than getting from it from mid-network entities.
[Med] I’m not sure we can have such general statement because the data may not
be available (e.g., DHCP for example) to clients + the data supplied by clients
(when possible) may not be reliable + enforcing policies based on
client-supplied data may have implication on other users (e.g., spoofing XFF
for example). Obviously, getting some of the information from a client may have
implications on QoE…the user needs to understand the root causes of a
degradation of QoE. Of course, these implications may not be new for users who
are familiar with disabling Java scripts and cie.

For example, the document states that the information in a Forward-For header
can be supplied by the host itself and then communicated to a remote consumer.
This is indeed possible, but because of abusing hosts some servers implement
whitelists to trust proxies; see
https://meta.wikimedia.org/w/extensions/TrustedXFF/trusted-hosts.txt.

The Wikimedia case is a very interesting one to raise, because it derives from
a set of assumptions about the network that are somewhat flawed and then
attempts to patch those flaws in ways that actually damage the mechanisms of
the system they originally built.
Wikimedia wants to allow folks to edit without login credentials. This allows
for anonymous users to make corrections or additions; this is a goal. The
consequence of that goal being achieved is that trolls or malicious editors can
have at anything they want.
Rather than institute credentials and ACLs, Wikimedia attempts to substitute
blocking by IP for blocking by credential. The property they are looking for
in IPs is not really there, though: they are not unique to individuals,
especially over time.

This damages those who share IP addresses (due to NATs or proxies). As far as
I can tell, the NAT problem is simply treated as collateral damage. For the
proxies, they attempt to work around the damage using XFF. That's spoofable,
though, so they attempt to limit it to specific proxies whose XFF they
trust--many of which require logins. That shifts the information about who is
editing Wikipedia out of their hands, but leaves it in the network and thus not
truly anonymous. I understand the engineering balance they are trying to
strike, but I'm not sure I can recommend their solution.

[Med] I’m not recommending their solution either, but I’m trying to raise the
point that an engineering balance is out there. ACKing that deployment reality
is better than ignoring it.

The deployment considerations text is meant to point out the engineering
balance. I'm happy to add the text noted above (on latency, the end user
responsibility for correct data, the PSAP carve out, and the explicit note that
the document does not treat how to obtain consent from a user so that an end
system can supply data).
[Med] Ok, thanks.
I'm less happy to add language on adversarial treatment of client-supplied
data. This is partly because many of the systems which use network-supplied
data are based on a misunderstanding of the properties of the data being added.
[Med] I agree this may be the case for some of them, but not all.
It is partly because the adversarial relationship can extend to
network-supplied data. It is also because a fair few of them are simply
security theater. If you have a specific edit you would like to propose,
though, I will consider it.
Thanks again,
Ted

RE: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design considerations for Metadata Insertion) to Informational RFC