Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

On Fri, Nov 6, 2015 at 5:46 PM, Bryan Ford <brynosaurus(_at_)gmail(_dot_)com> 
wrote:

To return to this thread - DKG brought up one important potential
functionality goal for the next OpenPGP message format (streaming-mode
integrity protection); then the thread diverged into a different and I think
orthogonal - though equally interesting - potential functionality goal
(namely random-access capability via Merkle trees as in Tahoe-LAFS).

I included a slide on this topic in my OpenPGP WG presentation
(https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing
slide 10) and was hoping to solicit discussion but there wasn’t time, so
perhaps we can continue here?

To be clear, there are two separate use-cases, each of which make sense
without the other and require different technical solutions (but could also
make sense together):

1. Streaming-mode integrity protection:  We want to make sure OpenPGP can be
used Unix filter-style on both encryption and decryption sides, to process
arbitrarily large files (e.g., huge backup tarballs), while satisfying the
following joint requirements:

(a) Ensure that neither the encryptor nor decryptor ever has to buffer the
entire stream in memory or any other intermediate storage.
(b) Ensure that the decryptor integrity-checks everything it decrypts BEFORE
passing it onto the next pipeline stage (e.g., un-tar).

2. Random-access: Once a potentially-huge OpenPGP-encrypted file has been
written to some random-access-capable medium, allow a reader to decrypt and
integrity-check parts of that encrypted file without (re-)processing the
whole thing: i.e., support integrity-protected random-access reads.

Let’s call these goals #1 and #2, respectively.

Achieving either goal will require dividing encrypted files into chunks of
some kind, but the exact metadata these chunks need to have will vary
depending on which goal we want to achieve (or both).

To achieve goal #1 properly, it appears that what we need is not only a MAC
per chunk but a signature per chunk.  If the encryptor only signs a single
aggregate MAC at the end, then the decryptor needs to process its input all
the way to that signature at the end before it can be certain that any (even
the first) bytes of the decrypted data are valid.  If the encryptor produces
a Merkle tree at the end and signs its root as in Tahoe-LAFS (e.g., in
pursuit of goal #2), the decryptor still needs to read to the end of its
input before being able to integrity-check anything, and hence still fails
to achieve goal #1.


[...]

1. How important is the ability to achieve goal #1 above in the OpenPGP
format (streaming-mode integrity-checking)?


Are you willing to accept a format that allows streaming decryption
but not streaming encryption?  If so, then you'd only need one
signature if you organize your Merkle tree correctly.  In fact:

2. How important is the ability to achieve goal #2 above in the OpenPGP
format (random-access integrity-checking)?


It's fairly easy to imagine a format that allows both streaming
verification and random-access verification with minimal size
overhead.  You could even create the thing in a semi-streamy manner,
where you'd stream out the bulk portion with blanks where the internal
nodes go and then write the internal nodes after the fact.

The best of all worlds might be to treat the Merkle data and the
signature as a detached file.  I bet that one could streamily encrypt
and sign a big file and produce *two* output streams: the bulk data
and a detached serialization of intermediate nodes, where there's a
single signature at the end.  A reader with access to both files could
random-access it or seek the detached signature a bit and then stream
the bulk file.

--Andy

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp