ietf
[Top] [All Lists]

Re: DHCID and the use of MD5 [Re: Last Call: 'Resolution of FQDN Conflicts among DHCP Clients' to Proposed Standard]

2005-11-26 10:03:13
In message 
<Pine(_dot_)LNX(_dot_)4(_dot_)64(_dot_)0511261615210(_dot_)26558(_at_)netcore(_dot_)fi>,
 Pekka Savola writes:
Hi,

I'll break out the most substantial comments in separate messages..

On Mon, 14 Nov 2005, The IESG wrote:
The IESG has received a request from the Dynamic Host Configuration WG to
consider the following documents:

- 'A DNS RR for Encoding DHCP Information (DHCID RR) '
  <draft-ietf-dnsext-dhcid-rr-10.txt> as a Proposed Standard
- 'Resolution of FQDN Conflicts among DHCP Clients '
  <draft-ietf-dhc-ddns-resolution-10.txt> as a Proposed Standard
- 'The DHCP Client FQDN Option '
  <draft-ietf-dhc-fqdn-option-11.txt> as a Proposed Standard
- 'The DHCPv6 Client FQDN Option '
  <draft-ietf-dhc-dhcpv6-fqdn-03.txt> as a Proposed Standard

I have only one major comment on DHCID on its use of MD5 as a 
glued-in hash-function.  The rest of the comments are rather 
straightforward.

substantial
----------

   In order to avoid exposing potentially sensitive identifying
   information, the data stored is the result of a one-way MD5 [5] hash
   computation.  The hash includes information from the DHCP client's
   REQUEST message as well as the domain name itself, so that the data
   stored in the DHCID RR will be dependent on both the client
   identification used in the DHCP protocol interaction and the domain
   name.  This means that the DHCID RDATA will vary if a single client
   is associated over time with more than one name.  This makes it
   difficult to 'track' a client as it is associated with various domain
   names.

   The MD5 hash algorithm has been shown to be weaker than the SHA-1
   algorithm; it could therefore be argued that SHA-1 is a better
   choice.  However, SHA-1 is significantly slower than MD5.  A
   successful attack of MD5's weakness does not reveal the original data
   that was used to generate the signature, but rather provides a new
   set of input data that will produce the same signature.  Because we
   are using the MD5 hash to conceal the original data, the fact that an
   attacker could produce a different plaintext resulting in the same
   MD5 output is not significant concern.

==> while the informatione exposure of someone cracking the MD5 hash 
is not too huge, I believe it is unacceptable to design new protocols 
without the capability to switch the hash function as need be.  This 
could be achieved for example by reserving one additional byte from 
the start of the DHCID record to designate the hash function used. 
If you don't bother to define your own registry (for all of me, you 
could include MD5 there as well, but at least include SHA1 and 
preferably also SHA-256), you could possibly re-use 
http://www.iana.org/assignments/ds-rr-types or something like that.

That way, we can introduce new hash functions in a backward compatible 
manner later on, with no need to revamp the protocol.

If we don't do this, we'll need to define DHCID2, DHCID3, .. etc. 
records further down in the future (w/ different hash functions) and 
make DHCP co-exist with all of them.  That's bound to cause a lot of 
protocol complexity, and I don't think we want to go there.

I agree with this comment.  The draft is wrong -- it asserts that a
"successful attack of MD5's weakness does not reveal the original data".
That's an overassumption -- we have no idea what such an attack would 
yield, since no such attack currently exists.

More generally...  The currently-known attacks on MD5 are collision 
attacks: it's possible to generate two inputs that produce the same 
hash value.  This scenario requires a preimage attack; none are known.
It would not surprise me if someone were to develop one, but until that 
happens we can't speculate on its properties.  There are, however, some 
reasons for concern.  One of the options defined, the DHCPv4 Client 
Identifier, probably doesn't have much entropy.  For example, a 
suggestion in RFC 2132 says to use the ARP hardware type code and MAC 
address.  There's exactly one interesting hardware type code for most
users, and the high-order 3 bytes of the MAC address are the 
manufacturer's ID, not many of which are actually used.  Given that 
this is an 8-byte input string and that MD5 has an 8-byte output, it is 
plausible that comparatively few input strings hash to any given output.
If several of the input bytes are fixed, or at least constrained, there 
may be only one.  For that matter, that assumption alone may lead to a 
successful attack on MD5. 

In fact, the Security Considerations section should analyze the 
(non-trivial) probability of a brute-force attack.  Again, consider the 
Client Identifier, which is likely 8 bytes long.  2 are fixed, and 
hence irrelevant.  According to today's copy of
http://standards.ieee.org/regauth/oui/oui.txt there are 8786 
manufacturer IDs, or slightly more than 8 bits.  Effectively, though, 
it's less, since the usage is very non-uniform.  Even if is uniform, 
though, that field plus the unit identifier only total slightly over 32 
bits -- well within anyone's capabilities.

Most of this analysis applies to the other two options as well.

                --Steven M. Bellovin, http://www.cs.columbia.edu/~smb



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf