It appears that the interest in hiding the public key is for preventing
the modulus from being exposed, and not for thwarting traffic analysis
to conceal the identities of the originators and recipients. Some
people said they would be happy with a public key hash as the key
selector, and many have explicitly said they prefer this. To wit...
There are two issues in using a public key hash:
1. What hash algorithm to use? In other words, should it be open and
specified on a per-use basis, or should we standardize on one
algorithm? What if not everyone supports the algorithm?
2. What exactly should we hash? In other words, if there are multiple
ways a public key could be represented (with different algorithm
identifiers, for example), how do we avoid confusion?
I will address the first question in this message and the second
question later.
For the choice of hash algorithm, let me make the following points:
* We are going to put the public key hash in an Originator-ID or
Recipient-ID field. In a Recipient-ID, it is to tell the recipient
which Key-Info field corresponds to him or her, among all the other
recipients of that one message. In an Originator-ID, it is to tell
the recipient which of the possible keypairs owned by the originator
was used to sign the message.
* Given the above, there are no "attacks" we are worried about for the
public key hash (other than nuisance or denial-of-service): if someome
substitutes a different public key hash in a Recipient-ID or
Originator-ID, the recipient will simply not find the right keying
material and will fail to process the message (a denial of service
attack which any form of identifier suffers from).
* The algorithm does not need to be a high-powered non-reversible
message digest algorithm. These are chosen for signatures because you
don't want someone to be able to find another message which produces
the same message digest (which would allow the signature check to
succeed, but on a substitute message). But for the public key has
algorithm, we are not concerned about reversibility attacks.
* The hash algorithm does need to produce a low probability of
collisions. But this criteria is extremely easy to meet, given the
random nature of the bytes in a public key. (Avoiding collisions would
be more difficlut If the inputs to the hash algorithm were patterned
like "I can't wait for Windows 95", "I can't wait for Windows 96", ...)
Jim Galvin writes:
Basically, including yet another algorithm identifier provides yet
another opportunity for two users to fail to interoperate. Although
it's probably true that we'll just recommend exactly one (e.g., MD5),
there's always the possibility that it will need to be changed.
Given that the hash algorithm we need can be extremely simple, as
stated above, and given that we need an algorithm with no strings
attached, I recommend (as I did a while ago) using the checksum
algorithm which is used by every TCP/IP application. When I proposed
this before, I was received favorably well. Mark S. Feldman pointed
out that we would want more than the 16 bits in the TCP/IP checksum,
so I suggest we us 32 bits. Therefore, for example, <id-email> for me
would look like:
Originator-ID: EN, 1FF323A9, jefft(_at_)netcom(_dot_)com
where 1FF323A9 is the hex-encoded 32-bit checksum of my public key.
Here is the (easy as pie) definition of the checksum from the TCP
document RFC 793:
The checksum field is the 16 bit one's complement of the one's
complement sum of all 16 bit words in the header and text. If a
segment contains an odd number of header and text octets to be
checksummed, the last octet is padded on the right with zeros to
form a 16 bit word for checksum purposes.
(For our purposes, use 32 instead of 16 and substitute "public key
encoding" for "header and text".) This can be implemented in about 10
lines of C code as shown in RFC 1071.
- Jeff