Re: PGP Keyserver Synchronization Protocol

On 1999-06-30 09:59:30 -0500, William H. Geiger III wrote:

There is no reason why a server could not generate a complete hash
table once a month, and then produce a "changes" hash table on a
daily or weekly basis that only contained the new/updated keys
since the last complete hash table was created.


This method relies on servers keeping state and thus bears the risk
of getting servers out of synch under error conditions.


I'd suggest you use something remotely inspired by Andrew Tridgell's
rsync algorithm, coupled with a binary search for differences over
the key space.  (Note: I followed the portion of this thread which
went to ietf-open-pgp, so I may have missed some discussion on
pgp-keyserver-folk.)

Let's assume we have some kind of unique serial number for the key
(finger print, etc.), and a method to generate a unique hash for a
given key block (that's the packet sorting issue).  Let's assume we
have sorted the keys by this serial number.

We first calculate a key block hash for each key.

We then hash together these hashes for each set of keys with the
same least-significant byte.  (Let's call these hashes h_{1,k}, k
being (id >> 8).  Note that there will be many k's for which the set
over which we have calculated the hash is empty; we don't store,
calculate or transmit these sets and hashes at all, saving
considerable memory and bandwidth.)

We now hash together all h_{1,k}'s where k has the same
least-significant byte, yielding h_{2,l}, l being (k >> 8).

We repeat this process several times, getting a hierarchy of hashes.

Now, the key server initiating the update ("client") transfers the
hashes of a certain level to another key server ("server").  This
level has to be computed on the base of the rate of changes to which
the key servers are subject, and on the number of keys on a server -
you don't want to transfer these hashes when there is a high
probability that _all_ of them will be different on the two servers;
you also don't want to transfer them when most of them are equal.

Now, the target server compares his set of hashes and the other
server's set of hashes.

We can have four situations:

- Client transferred a hash which wasn't calculated on server.  This
  means that client has keys in a block which aren't present on
  server.  Server will request all keys matching this block.

- Client did not transfer a hash for a block for which server has
  one.  Server will now initiate a transfer of all keys in this
  block to client.

- Client and server have different hashes.  In this case, server has
  two choices: 

  -> Request all keys in this block. This may be an option if server
     has only few keys in this block (assuming, according to the
     rate of change, that client hasn't many more keys, either), or
     if the block itself is small enough.  This is another point
     where you can optimize the protocol by modifying parameters.

  -> Go down one hash level and repeat the process for this block.

- Client and server have identical hashes.  Fine.