ietf
[Top] [All Lists]

RE: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design considerations for Metadata Insertion) to Informational RFC

2017-03-03 03:24:43
Hi Ted,

Please see inline.

Cheers,
Med

De : Ted Hardie [mailto:ted(_dot_)ietf(_at_)gmail(_dot_)com]
Envoyé : jeudi 2 mars 2017 19:02
À : BOUCADAIR Mohamed IMT/OLN
Cc : ietf(_at_)ietf(_dot_)org; 
draft-hardie-privsec-metadata-insertion(_at_)ietf(_dot_)org
Objet : Re: Last Call: <draft-hardie-privsec-metadata-insertion-05.txt> (Design 
considerations for Metadata Insertion) to Informational RFC

wing are missing from the document:
It's difficult to say how something will be used in the future.
[Med] An advice that is not implementable makes more troubles, IMHO.
Sorry, I thought you were asking what wgs or protocols planned to reference 
this.  For that, I don't know.
[Med] OK. IMHO lacking such considerations, there is a high risk that the 
advice will be lost or that it can be used as a permanent DISCUSS point in 
later stages of preparing documents. I’d prefer if actionable points to be 
considered by WGs and document authors in early stages.

The intent is that it is information useful to those considering whether 
restoring metadata lost to encryption in mid-network is the right way to go.
[Med] This is another assumption in the document that I disagree with: It seems 
that you assume that an on-path device, that inserts metadata, is necessarily 
RESTORING back that information. This is not true for many efforts:

·         A Forward-For header inserted by a proxy does not restore any data; 
it does only reveal data that is already present in the packet issued by the 
client itself.

·         An address sharing device, under for example DS-Lite (RFC6333), that 
inserts the source IPv6 prefix in the TCP HOST_ID option (RFC7974) is not 
RESTORING any data. The content of that TCP option is already visible in the 
packet sent by the host.

·         Service Function Chaining WG 
(https://datatracker.ietf.org/wg/sfc/about/) is defining an architecture to 
communicate metadata by on-path devices; that metadata is inserted at the 
network side. Border nodes will make sure that data is stripped before 
forwarding packets to the ultimate destinations. The metadata can be a 
subscriber-id, a policy-id, etc.


So when draft-hardie-* says: “Do not add metadata to flows at intermediary 
devices unless
   a positive affirmation of approval for restoration has been received
   from the actor whose data will be added.”

(1) Do you assume that the sample examples I listed above fall under your 
advice?
(2) How an on-path device will know the data it intends to insert is a 
“restoration”?
(3) Does it mean that for new data (i.e., that are not restoration), on-path 
devices are free to do whatever they want? For me, this is undesirable. There 
is a void there. A statement to require those networks to avoid leaking privacy 
information must be included.

Another assumption is made here:

   Instead, design the protocol so that the actor can add such metadata
   themselves so that it flows end-to-end, rather than requiring the
   action of other parties.  In addition to improving privacy, this
   approach ensures consistent availability between the communicating
   parties, no matter what path is taken.

This text claims that providing data by the endpoint ensures a “consistent 
availability” of that information. This is broken for a multi-homed host that 
uses for example Forward-For header: Obviously, the content of the header if 
injected by the endpoint will depend on the path. A way to ensure a “consistent 
availability” is to insert many Forward-For headers; each enclosing the content 
that is specific to a given network attachment. But doing that raises a privacy 
concern because the remote server can track clients.
My intent (and the understanding of other reviewers) is to highlight that these 
mechanisms have a privacy-damaging result and that this should be considered.
[Med] I do think existing documents already make that job. I do think we need 
more.

Sorry, did you mean "do not think we need more"?
[Med] I meant we need more than only highlighting the issue. We need something 
which is actionable. Requiring a Privacy Section in every RFC may a direction 
to consider.

  If so, I obviously disagree.  This design pattern is used uncritically enough 
that a brief document describing why it isn't safe still seems to me useful.  
Were it incorporated into a more general document (as noted before), that would 
also work.  If it later is, that more general work could obsolete this (though 
that's a bid for an informational document).

 In particularly, I'm concerned that some application functions in the network 
(e.g. recursive resolvers or proxies) do not consider the postive privacy 
implications of their aggregation and so do not consider adding this data back 
as problematic.
[Med] I’m also concerned with that, too (see e.g., 
http://www1.icsi.berkeley.edu/~narseo/papers/hotm42-vallinarodriguez.pdf<http://www1.icsi.berkeley.edu/%7Enarseo/papers/hotm42-vallinarodriguez.pdf>).
 In the meantime, I’m also concerned with (1) some applications that leak 
privacy information without the consent of the user and (2) some application 
servers that may correlate various information shared by an application client 
to track users (e.g., https://panopticlick.eff.org/). BTW, I see that you are 
using “application function” which may not have the same meaning as the general 
“protocol” wording used in draft-hardie-*. Do you consider a DHCP relay as an 
“application function”?
   Highlighting this enables them to see this traffic in a different context.
[Med] Isn’t this already assumed by some protocol designers (e.g., RFC6973, 
SIP)? BTW, there are subtleties when proxies are in the same trust domain of 
the client or server.
There are certainly some protocol designers that have internalized this, but my 
experience has been that this is not always the case.  In a fair few cases, 
folks deploy  methods like this because they see encryption of metadata in data 
integrity terms or see aggregation only in terms of data usage minimization.  
They restore the metadata mid-network because it is the quickest solution for 
them to get back to the status quo ante for their understanding of the system.

[Med] I hear you. What would be the harm if those solutions strip that 
information before sending it to the server? If they don’t strip it, this means 
that either the information can be parsed and used by the server, or at least 
its presence does not lead to session failures. In the case the server parses 
and uses that information, this means that the presence of that information is 
important for the service to deliver. In that case, the question is why the 
client does not supply that information at the first place.
* that data may not be always available to the endhost
Understood, but even in this case, it is better to make the permission to add 
the data explicit.
[Med] This may be easy to implement for some applications, but this may not be 
generalized to ** all ** protocols.

You are certainly correct that many deployed protocols would find it hard to 
retrofit this consent model into their existing flows.    This is, however, 
advice for folks at the design phase.  If RFC 6788 were being written after the 
publication of this document, its authors might well have looked at the 
protocol mechanics in section 5.2:

   The AN

   intercepts and then tunnels the received Router Solicitation in a

   newly created IPv6 datagram with the Line-Identification Option

   (LIO).  The AN forms a new IPv6 datagram whose payload is the

   received Router Solicitation message as described in 
[RFC2473<https://tools.ietf.org/html/rfc2473>],

   except that the Hop Limit field of the Router Solicitation message

   MUST NOT be decremented.

and asked whether the circuit identifier corresponding to the logical
access loop port of the AN from which the RS was initiated PII.  If so, this
document would have them consider whether transparent interception
is the appropriate choice if it is.  There clearly are flows in which the AN's 
role
would be explicit.

I don't know, frankly, which choice is right in this case, but I would prefer 
that
the choice be made with an easy reference to the implications of inserting 
metadata
at hand.
Putting aside the interaction with a user to get a consent and how that consent 
will need to be changed when another user uses the same device to connect to 
the Internet. Consider a user who does not want an upstream DHPC relay to 
insert the line-id (https://tools.ietf.org/html/rfc6788) to the server, and 
let’s suppose the relay received a signal (by some means, to be yet specified) 
that for this particular DHCP client, the line-id must not be inserted. For 
this case, connectivity won’t be provided to that user. This would mean extra 
calls to the hotline for that network provider. This is not desirable for both 
customers and network providers.
I

f this can be done in parallel with other actions, then the latency impact can 
be minimized.
[Med] These are assumptions and implications that are worth to be added to the 
draft.

Okay, how about the following text being added to section 5.
There also tensions with latency of operation. For example, where the end 
system does not initially know the information which would be added by on-path 
devices, it must engage the protocol mechanisms to determine it.  Determining a 
public IP address to include in a locally supplied header might require a STUN 
exchange, and the additional latency of this exchange discourages deployment of 
host-based solutions.  To minimize this latency, engaging those mechanisms may 
need to be done in parallel with or in advance of the core protocol exchanges 
with which this metadata would be supplied.
[Med] Looks good to me. Thanks.

BTW, this falls into this general discussion in 
https://tools.ietf.org/html/rfc6973:

   a.  Trade-offs.  Does the protocol make trade-offs between privacy
       and usability, privacy and efficiency, privacy and
       implementability, or privacy and other design goals?  Describe
       the trade-offs and the rationale for the design chosen.
* a misbehaving node may be tempted to spoof the data to be injected. A remote 
device that will use that data to enforce policies will be broken.
This point was discussed extensively in the GEOPRIV work and essentially a 
single carve-out was made:  for emergency services, where falsely asserted 
location data could be used to SWAT individuals or consume safety resources.    
I don't think that falls into this narrow advice, but I would be willing to add 
something like this to the security considerations:
"Note that some emergency service recipients, notably PSAPs (Public Safety 
Answering Points) may prefer data provided by a network to data provided by end 
system, because an end system could use false data to attack others or consume 
resources.   While this has the consequence that the data available to the PSAP 
is often more coarse than that available to the end system, the risk of false 
data being provided involved a risk to the lives of those targeted."
[Med] Thank you. Providing PSAP as an example is OK, but I’d like the issue to 
be called out as a generic one while PSAP is provided as an example. What about 
the following:

"Note that some servers (e.g., emergency service recipients, notably PSAPs 
(Public Safety Answering Points) [RFC6443]) may prefer data provided by a 
network to data provided by the end system, because an end system could use 
false data to attack others or consume resources.  While this has the 
consequence that the data available to the server is often more coarse than 
that available to the end system, the risk of false data being provided 
involved a risk to the lives of those targeted."


I don't think that emergency service recipients shifting to an example works 
here, because it broadens the carve out.  In the emergency services case, the 
resources consumed are fire trucks, ambulances, and swat teams.  For other 
servers, resources consumed could simply be  CPU cycles or disk; that's really 
not the same.  Balancing location consent requirements against one was agreed; 
balancing it against the other was not.

[Med] Resources may not be restricted to CPU or disk but may be granting access 
to the service (e.g., download a file when a quota per source address is 
enforced). It can be whatever the servers consider to be critical for them; it 
is up to the taste of the service design to characterize it. The NEW wording 
proposed above is technically correct. Please reconsider adding it to the draft.


* it was reported in the past that some browsers leak the MSISDN and other 
sensitive data.
This is true, but it seems to me unrelated to the point of the document.
[Med] It is related because blindly trusting an application client (and server) 
has its own privacy risks. This is even exacerbated given the rich data that is 
available to an application client and also because of the visibility on 
various layers available to an application server.

I agree that it has its own privacy risks, but I don't think this is the 
document that should explore them.
[Med] You don’t need to explore them, but to add one or two sentences to remind 
that privacy leaks are still a valid concern even if only clients are supplying 
data without the help of an on-path network device.
From that flow some of your other concerns about audience, at least as I 
understand.  As written, this is narrow advice for a broad audience: basically, 
anyone who would consider the form of metadata insertion it describes.  You 
would, if I understand you, prefer a narrower description of the audience in a 
larger context.

[Med] The key point here is about the practicality of implementing the advice 
NOT changing the scope. For example, the document says that it is better that a 
host is injecting the data but the document does not question whether that 
supplied data can be trusted or not,

Broadening this a bit, you're looking at two cases: one in which the data the 
host has is wrong and one in which there is an adversarial relationship.  For 
the first case, we can add text saying that when an end system supplies data it 
is the end system's responsibility to ensure that it is correct; don't use a 
STUN result from last week as fresh, for example.
[Med] OK.

  For the second case,  in which the server treats user supplied data as 
potentially misleading because the user may wish to circumvent restrictions, 
I'll point out the Wikimedia example demonstrates that simply shifting the 
trust to a mid-point entity doesn't work; it has to be shifted to an entity 
within the trust domain of the server.  So the question isn't really "end-user 
system supplied data can be trusted or not", the same question applies to 
whomever supplies the data.
[Med] Fully agree. Having some text to record that the concern applies, 
including for client supplied data.


or how the consent will be obtained from a user.

You're right that I'm leaving aside the question of how the user sets the 
policies, because it may vary by protocol and type of device too much to make 
general advice useful.  If you would like me to add an explicit statement to 
that effect, I am happy to note that it is not covered.
[Med] Please add some text about this point. Thank you.


In general, the point of the document is that the host should be able to omit 
the data without mid-network devices adding it back.  That's the point of 
protecting the traffic in the first place, after all.  I am saying that if the 
protocols require the data, then getting it from the end host has better 
privacy properties than getting from it from mid-network entities.
[Med] I’m not sure we can have such general statement because the data may not 
be available (e.g., DHCP for example) to clients + the data supplied by clients 
(when possible) may not be reliable + enforcing policies based on 
client-supplied data may have implication on other users (e.g., spoofing XFF 
for example). Obviously, getting some of the information from a client may have 
implications on QoE…the user needs to understand the root causes of a 
degradation of QoE. Of course, these implications may not be new for users who 
are familiar with disabling Java scripts and cie.

For example, the document states that the information in a Forward-For header 
can be supplied by the host itself and then communicated to a remote consumer. 
This is indeed possible, but because of abusing hosts some servers implement 
whitelists to trust proxies; see 
https://meta.wikimedia.org/w/extensions/TrustedXFF/trusted-hosts.txt.


The Wikimedia case is a very interesting one to raise, because it derives from 
a set of assumptions about the network that are somewhat flawed and then 
attempts to patch those flaws in ways that actually damage the mechanisms of 
the system they originally built.
Wikimedia wants to allow folks to edit without login credentials.  This allows 
for anonymous users to make corrections or additions; this is a goal.  The 
consequence of that goal being achieved is that trolls or malicious editors can 
have at anything they want.
Rather than institute credentials and ACLs, Wikimedia attempts to substitute 
blocking by IP for blocking by credential.  The property they are looking for 
in IPs is not really there, though:  they are not unique to individuals, 
especially over time.

This damages those who share IP addresses (due to NATs or proxies).  As far as 
I can tell, the NAT problem is simply treated as collateral damage.  For the 
proxies, they attempt to work around the damage using XFF.  That's spoofable, 
though, so they attempt to limit it to specific proxies whose XFF they 
trust--many of which require logins.  That shifts the information about who is 
editing Wikipedia out of their hands, but leaves it in the network and thus not 
truly anonymous.  I understand the engineering balance they are trying to 
strike, but I'm not sure I can recommend their solution.

[Med] I’m not recommending their solution either, but I’m trying to raise the 
point that an engineering balance is out there. ACKing that deployment reality 
is better than ignoring it.


The deployment considerations text is meant to point out the engineering 
balance.  I'm happy to add the text noted above (on latency, the end user 
responsibility for correct data, the PSAP carve out, and the explicit note that 
the document does not treat how to obtain consent from a user so that an end 
system can supply data).
[Med] Ok, thanks.
I'm less happy to add language on adversarial treatment of client-supplied 
data.  This is partly because many of the systems which use network-supplied 
data are based on a misunderstanding of the properties of the data being added.
[Med] I agree this may be the case for some of them, but not all.
  It is partly because the adversarial relationship can extend to 
network-supplied data.  It is also because a fair few of them are simply 
security theater.  If you have a specific edit you would like to propose, 
though, I will consider it.
Thanks again,
Ted
<Prev in Thread] Current Thread [Next in Thread>