To return to this thread - DKG brought up one important potential functionality
goal for the next OpenPGP message format (streaming-mode integrity protection);
then the thread diverged into a different and I think orthogonal - though
equally interesting - potential functionality goal (namely random-access
capability via Merkle trees as in Tahoe-LAFS).
I included a slide on this topic in my OpenPGP WG presentation
(https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing
<https://drive.google.com/file/d/0BwK1bcoczINtMEI2Y3A1Rm52SXc/view?usp=sharing>
slide 10) and was hoping to solicit discussion but there wasn’t time, so
perhaps we can continue here?
To be clear, there are two separate use-cases, each of which make sense without
the other and require different technical solutions (but could also make sense
together):
1. Streaming-mode integrity protection: We want to make sure OpenPGP can be
used Unix filter-style on both encryption and decryption sides, to process
arbitrarily large files (e.g., huge backup tarballs), while satisfying the
following joint requirements:
(a) Ensure that neither the encryptor nor decryptor ever has to buffer
the entire stream in memory or any other intermediate storage.
(b) Ensure that the decryptor integrity-checks everything it decrypts
BEFORE passing it onto the next pipeline stage (e.g., un-tar).
2. Random-access: Once a potentially-huge OpenPGP-encrypted file has been
written to some random-access-capable medium, allow a reader to decrypt and
integrity-check parts of that encrypted file without (re-)processing the whole
thing: i.e., support integrity-protected random-access reads.
Let’s call these goals #1 and #2, respectively.
Achieving either goal will require dividing encrypted files into chunks of some
kind, but the exact metadata these chunks need to have will vary depending on
which goal we want to achieve (or both).
To achieve goal #1 properly, it appears that what we need is not only a MAC per
chunk but a signature per chunk. If the encryptor only signs a single
aggregate MAC at the end, then the decryptor needs to process its input all the
way to that signature at the end before it can be certain that any (even the
first) bytes of the decrypted data are valid. If the encryptor produces a
Merkle tree at the end and signs its root as in Tahoe-LAFS (e.g., in pursuit of
goal #2), the decryptor still needs to read to the end of its input before
being able to integrity-check anything, and hence still fails to achieve goal
#1.
To achieve goal #2, the Merkle tree approach used in Tahoe-LAFS seems
well-suited. But it assumes/requires the decryptor has a complete,
random-access seekable copy of the encrypted file sitting somewhere, which is
certainly a reasonable assumption in the context of a file system like
Tahoe-LAFS, but violates requirement (a) of goal #1 and in particular may not
hold when OpenPGP is used to filter-decrypt from unseekable media or a network
stream to an un-tar type process.
We could very well design an OpenPGP format that addresses both goals together,
if we decide both goals are valuable. The encryptor would need to build its
Merkle tree incrementally - without initially knowing its total size - and
after encrypting and outputting each chunk, generate and sign a new Merkle tree
root after each chunk. We wouldn’t necessarily have to store an entirely new
Merkle tree after each chunk, only a log2(N)-size list of hashes comprising the
“update path” from the new chunk back to the root, reusing the un-updated parts
of the Merkle tree already stored in earlier parts of the encrypted file. (But
this will require that earlier Merkle trees be “incrementally updatable”, which
might violate the properties of some of the Merkle tree proposals suggested
elsewhere in other threads.)
There are some obvious tradeoffs here, both in storage and complexity costs.
I’m not that worried about the storage efficiency costs, because even the costs
of a log2(N)-size list of hashes per chunk can be amortized fairly effectively
by making the chunks moderately large, e.g., 65KB or even 1MB. There’s also
the compute cost of performing a public-key signature rather than just a MAC
per chunk, but that can be similarly amortized by using relatively large
chunks. On the other hand, large chunks might not be ideal for
embedded/constrained devices. And the implementation-complexity is certainly
an issue regardless.
So some questions about this:
1. How important is the ability to achieve goal #1 above in the OpenPGP format
(streaming-mode integrity-checking)?
2. How important is the ability to achieve goal #2 above in the OpenPGP format
(random-access integrity-checking)?
3. For whichever goal(s) we wish to be able to achieve, should those be
*mandatory* or *optional* in the format? That is, should *every*
OpenPGPv5-encrypted file satisfy either or both of these goals, or should they
be configurable or user-selectable (such that some encrypted files might
contain per-chunk signatures and/or Merkle trees while others do not)? Making
either of these goals “supported but optional” might help mitigate any
performance/storage cost concerns with either of them, but would only further
increase the complexity of the overall OpenPGP spec and increase the “usability
risk” of a user accidentally failing to enable a relevant option when he really
should have (e.g., streaming-mode protection for backups).
4. What are reasonable upper- and lower-bounds for chunk sizes, and what are
the considerations behind them?
Thanks
Bryan
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp