Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design

wing are missing from the document:

It's difficult to say how something will be used in the future.

[Med] An advice that is not implementable makes more troubles, IMHO.

Sorry, I thought you were asking what wgs or protocols planned to reference
this.  For that, I don't know.  The intent is that it is information useful
to those considering whether restoring metadata lost to encryption in
mid-network is the right way to go.

My intent (and the understanding of other reviewers) is to highlight that
these mechanisms have a privacy-damaging result and that this should be
considered.

[Med] I do think existing documents already make that job. I do think we
need more.

Sorry, did you mean "do not think we need more"?  If so, I obviously
disagree.  This design pattern is used uncritically enough that a brief
document describing why it isn't safe still seems to me useful.  Were it
incorporated into a more general document (as noted before), that would
also work.  If it later is, that more general work could obsolete this
(though that's a bid for an informational document).

 In particularly, I'm concerned that some application functions in the
network (e.g. recursive resolvers or proxies) do not consider the postive
privacy implications of their aggregation and so do not consider adding
this data back as problematic.

[Med] I’m also concerned with that, too (see e.g.,
http://www1.icsi.berkeley.edu/~narseo/papers/hotm42-vallinarodriguez.pdf).
In the meantime, I’m also concerned with (1) some applications that leak
privacy information without the consent of the user and (2) some
application servers that may correlate various information shared by an
application client to track users (e.g., https://panopticlick.eff.org/).
BTW, I see that you are using “application function” which may not have the
same meaning as the general “protocol” wording used in draft-hardie-*. Do
you consider a DHCP relay as an “application function”?

   Highlighting this enables them to see this traffic in a different
context.

[Med] Isn’t this already assumed by some protocol designers (e.g.,
RFC6973, SIP)? BTW, there are subtleties when proxies are in the same trust
domain of the client or server.

There are certainly some protocol designers that have internalized this,

but my experience has been that this is not always the case.  In a fair few
cases, folks deploy  methods like this because they see encryption of
metadata in data integrity terms or see aggregation only in terms of data
usage minimization.  They restore the metadata mid-network because it is
the quickest solution for them to get back to the status quo ante for their
understanding of the system.

* that data may not be always available to the endhost

Understood, but even in this case, it is better to make the permission to
add the data explicit.

[Med] This may be easy to implement for some applications, but this may
not be generalized to ** all ** protocols.


You are certainly correct that many deployed protocols would find it hard
to retrofit this consent model into their existing flows.    This is,
however, advice for folks at the design phase.  If RFC 6788 were being
written after the publication of this document, its authors might well have
looked at the protocol mechanics in section 5.2:

   The AN
   intercepts and then tunnels the received Router Solicitation in a
   newly created IPv6 datagram with the Line-Identification Option
   (LIO).  The AN forms a new IPv6 datagram whose payload is the
   received Router Solicitation message as described in [RFC2473
<https://tools.ietf.org/html/rfc2473>],
   except that the Hop Limit field of the Router Solicitation message
   MUST NOT be decremented.

and asked whether the circuit identifier corresponding to the logical
access loop port of the AN from which the RS was initiated PII.  If so, this
document would have them consider whether transparent interception
is the appropriate choice if it is.  There clearly are flows in which
the AN's role
would be explicit.

I don't know, frankly, which choice is right in this case, but I would
prefer that
the choice be made with an easy reference to the implications of
inserting metadata
at hand.

Putting aside the interaction with a user to get a consent and how that

consent will need to be changed when another user uses the same device to
connect to the Internet. Consider a user who does not want an upstream DHPC
relay to insert the line-id (https://tools.ietf.org/html/rfc6788) to the
server, and let’s suppose the relay received a signal (by some means, to be
yet specified) that for this particular DHCP client, the line-id must not
be inserted. For this case, connectivity won’t be provided to that user.
This would mean extra calls to the hotline for that network provider. This
is not desirable for both customers and network providers.

I

f this can be done in parallel with other actions, then the latency impact
can be minimized.

[Med] These are assumptions and implications that are worth to be added to
the draft.


Okay, how about the following text being added to section 5.

There also tensions with latency of operation. For example, where the end
system does not initially know the information which would be added by
on-path devices, it must engage the protocol mechanisms to determine it.
Determining a public IP address to include in a locally supplied header
might require a STUN exchange, and the additional latency of this exchange
discourages deployment of host-based solutions.  To minimize this latency,
engaging those mechanisms may need to be done in parallel with or in
advance of the core protocol exchanges with which this metadata would be
supplied.

BTW, this falls into this general discussion in
https://tools.ietf.org/html/rfc6973:



   a.  Trade-offs.  Does the protocol make trade-offs between privacy

       and usability, privacy and efficiency, privacy and

       implementability, or privacy and other design goals?  Describe

       the trade-offs and the rationale for the design chosen.

* a misbehaving node may be tempted to spoof the data to be injected. A
remote device that will use that data to enforce policies will be broken.

This point was discussed extensively in the GEOPRIV work and essentially a
single carve-out was made:  for emergency services, where falsely asserted
location data could be used to SWAT individuals or consume safety
resources.    I don't think that falls into this narrow advice, but I would
be willing to add something like this to the security considerations:

"Note that some emergency service recipients, notably PSAPs (Public Safety
Answering Points) may prefer data provided by a network to data provided by
end system, because an end system could use false data to attack others or
consume resources.   While this has the consequence that the data available
to the PSAP is often more coarse than that available to the end system, the
risk of false data being provided involved a risk to the lives of those
targeted."

[Med] Thank you. Providing PSAP as an example is OK, but I’d like the
issue to be called out as a generic one while PSAP is provided as an
example. What about the following:



"Note that some servers (e.g., emergency service recipients, notably PSAPs
(Public Safety Answering Points) [RFC6443]) may prefer data provided by a
network to data provided by the end system, because an end system could use
false data to attack others or consume resources.  While this has the
consequence that the data available to the server is often more coarse than
that available to the end system, the risk of false data being provided
involved a risk to the lives of those targeted."


I don't think that emergency service recipients shifting to an example
works here, because it broadens the carve out.  In the emergency services
case, the resources consumed are fire trucks, ambulances, and swat teams.
For other servers, resources consumed could simply be  CPU cycles or disk;
that's really not the same.  Balancing location consent requirements
against one was agreed; balancing it against the other was not.

* it was reported in the past that some browsers leak the MSISDN and other
sensitive data.

This is true, but it seems to me unrelated to the point of the document.

[Med] It is related because blindly trusting an application client (and
server) has its own privacy risks. This is even exacerbated given the rich
data that is available to an application client and also because of the
visibility on various layers available to an application server.


I agree that it has its own privacy risks, but I don't think this is the
document that should explore them.

From that flow some of your other concerns about audience, at least as I
understand.  As written, this is narrow advice for a broad audience:
basically, anyone who would consider the form of metadata insertion it
describes.  You would, if I understand you, prefer a narrower description
of the audience in a larger context.



[Med] The key point here is about the practicality of implementing the
advice NOT changing the scope. For example, the document says that it is
better that a host is injecting the data but the document does not question
whether that supplied data can be trusted or not,

Broadening this a bit, you're looking at two cases: one in which the data
the host has is wrong and one in which there is an adversarial
relationship.  For the first case, we can add text saying that when an end
system supplies data it is the end system's responsibility to ensure that
it is correct; don't use a STUN result from last week as fresh, for
example.   For the second case,  in which the server treats user supplied
data as potentially misleading because the user may wish to circumvent
restrictions, I'll point out the Wikimedia example demonstrates that simply
shifting the trust to a mid-point entity doesn't work; it has to be shifted
to an entity within the trust domain of the server.  So the question isn't
really "end-user system supplied data can be trusted or not", the same
question applies to whomever supplies the data.

or how the consent will be obtained from a user.

You're right that I'm leaving aside the question of how the user sets the
policies, because it may vary by protocol and type of device too much to
make general advice useful.  If you would like me to add an explicit
statement to that effect, I am happy to note that it is not covered.



In general, the point of the document is that the host should be able to
omit the data without mid-network devices adding it back.  That's the point
of protecting the traffic in the first place, after all.  I am saying that
if the protocols require the data, then getting it from the end host has
better privacy properties than getting from it from mid-network entities.

[Med] I’m not sure we can have such general statement because the data may
not be available (e.g., DHCP for example) to clients + the data supplied by
clients (when possible) may not be reliable + enforcing policies based on
client-supplied data may have implication on other users (e.g., spoofing
XFF for example). Obviously, getting some of the information from a client
may have implications on QoE…the user needs to understand the root causes
of a degradation of QoE. Of course, these implications may not be new for
users who are familiar with disabling Java scripts and cie.



For example, the document states that the information in a Forward-For
header can be supplied by the host itself and then communicated to a remote
consumer. This is indeed possible, but because of abusing hosts some
servers implement whitelists to trust proxies; see
https://meta.wikimedia.org/w/extensions/TrustedXFF/trusted-hosts.txt.






The Wikimedia case is a very interesting one to raise, because it derives
from a set of assumptions about the network that are somewhat flawed and
then attempts to patch those flaws in ways that actually damage the
mechanisms of the system they originally built.

Wikimedia wants to allow folks to edit without login credentials.  This
allows for anonymous users to make corrections or additions; this is a
goal.  The consequence of that goal being achieved is that trolls or
malicious editors can have at anything they want.

Rather than institute credentials and ACLs, Wikimedia attempts to
substitute blocking by IP for blocking by credential.  The property they
are looking for in IPs is not really there, though:  they are not unique to
individuals, especially over time.

This damages those who share IP addresses (due to NATs or proxies).  As
far as I can tell, the NAT problem is simply treated as collateral damage.
For the proxies, they attempt to work around the damage using XFF.  That's
spoofable, though, so they attempt to limit it to specific proxies whose
XFF they trust--many of which require logins.  That shifts the information
about who is editing Wikipedia out of their hands, but leaves it in the
network and thus not truly anonymous.  I understand the engineering balance
they are trying to strike, but I'm not sure I can recommend their solution.



[Med] I’m not recommending their solution either, but I’m trying to raise
the point that an engineering balance is out there. ACKing that deployment
reality is better than ignoring it.


The deployment considerations text is meant to point out the engineering
balance.  I'm happy to add the text noted above (on latency, the end user
responsibility for correct data, the PSAP carve out, and the explicit note
that the document does not treat how to obtain consent from a user so that
an end system can supply data).

I'm less happy to add language on adversarial treatment of client-supplied
data.  This is partly because many of the systems which use
network-supplied data are based on a misunderstanding of the properties of
the data being added.  It is partly because the adversarial relationship
can extend to network-supplied data.  It is also because a fair few of them
are simply security theater.  If you have a specific edit you would like to
propose, though, I will consider it.

Thanks again,

Ted

Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design considerations for Metadata Insertion) to Informational RFC