ietf-openpgp
[Top] [All Lists]

Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

2015-10-31 19:48:26
Hi folks:

I'm one of the designers of Tahoe-LAFS.

First of all, I'd like to agree that this is an important issue — we
do need a data integrity protocol with which a reader can verify a
subset of the data. This is especially useful for heads — leading
bytes — of the data, so that when someone executes a command like
this:

curl http://foo.info/x.tar.pgp | gpg --decrypt | tar x

the "gpg" process can operate with limited RAM/storage and can also be
sure that any data it passes to the tar process is authentic, before
it passes that data to the tar process.

In addition, some techniques can also allow authenticated reads of
arbitrary spans of the data. The Merkle Tree approach, such as is used
in Tahoe-LAFS, allows this. Therefore if you're reading a file from
Tahoe-LAFS then if you do something like "fseek(filehandle,
1000000000, 0); fread(buf, 1, 1000000, filehandle);" then you'll get
the one million bytes of data which begins one billion bytes into the
file, but you'll get the assurance from Tahoe-LAFS that those one
million bytes are cryptographically authenticated.

Second, I'd like to emphasize something that dkg pointed out in the
first post — that even if we do this correctly at this layer, then
there is still a risk of truncation attacks. For example, imagine that
someone runs:

curl http://foo.info/x.sh.pgp | gpg --decrypt | sh

Even if gpg can ensure cryptographic integrity of every byte before it
passes that byte to sh, this is vulnerable to potentially damaging
attacks by interrupting the flow of ciphertext to gpg. This is a
problem that can't be solved by the gpg process in this example. It
has to be solved by the next process in the chain — tar in the first
example or sh in the second. But let's remember that it is a real
problem.


 b) it doesn't seem to compose as well with asymmetric signatures as one
    might like: a signature over the whole material can't itself be
    verified until one full pass through the data; and a signature over
    just the symmetric key would prove nothing, since anyone getting the
    symmetric key could forge an arbitrary valid, decryptable stream.
    Is there an intermediate approach that would combine an asymmetric
    signature with a chunkable authenticated encryption such that a
    decryptor could stream one pass and be certain of its origin (at
    least up until truncation, if (a) can't be resolved)?

This is a big deal, and an under-appreciated one. A lot of modern
cryptography was developed in the model of a bilateral and synchronous
connection between two parties. In that model, this isn't a problem.
You have a shared secret, and anything that you receive that *you*
didn't send you can assume that the other party sent. (So you have to
prevent replay and reflection attacks, but if you've done so, then
this isn't a problem.)

But in a more asynchronous/persistent model, and in a model with more
than two parties, then you can't rely on that and you need something
else.

The way we do this in Tahoe-LAFS (like Taylor Campbell explained in
this thread), is that the writer generates a Merkle Tree over the data
and transmits, along with each block, the Merkle Branch needed to
authenticate that block.

Now the reason we do it that way in Tahoe-LAFS is that we want to bind
all the bytes of (one version of) the file together. If a reader reads
the first million bytes of a file, and then reads the second million
bytes of the file, you don't want an attacker to have the option of
supplying the first million from one version of the file and the next
million from a different version of the file, without the reader
realizing this, even if both versions of the file were, at some point,
signed by the legitimate writer.

So using the Merkle Tree provides a convenient way to:

* bundle all the bytes of the file into a single crypto value (the
Merkle Tree root) which we can then use as a "stand-in" for the
complete contents of a single version of the file, for authentication
purposes

* suffer low worst-case overhead for a read of an arbitrary span of
data (i.e., if you're reading a random span out of the middle of a
file, not just reading from the beginning)

But, we made that decision back before we were willing to rely on ECC,
so our public key digital signatures were big old 2048-bit RSA sigs.
Now that we would be willing to rely on svelte Ed25519 sigs, we might
consider including a pkdigsig with each block instead of a Merkle
Branch with each block. It would require careful engineering about the
identification and versioning of the file, either way.

Regards,

Zooko

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp

<Prev in Thread] Current Thread [Next in Thread>