ietf-openpgp
[Top] [All Lists]

Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

2015-11-08 04:43:12
On 7/11/2015 01:46 am, Bryan Ford wrote:
To be clear, there are two separate use-cases, each of which make sense
without the other and require different technical solutions (but could
also make sense together):

1. Streaming-mode integrity protection:  We want to make sure OpenPGP
can be used Unix filter-style on both encryption and decryption sides,
to process arbitrarily large files (e.g., huge backup tarballs), while
satisfying the following joint requirements:

(a) Ensure that neither the encryptor nor decryptor ever has to buffer
the entire stream in memory or any other intermediate storage.

Yes.

(b) Ensure that the decryptor integrity-checks everything it decrypts
BEFORE passing it onto the next pipeline stage (e.g., un-tar).


ok. So this is where a program-level option comes in. In streaming mode, the streamer can keep decrypting and passing it across to the reader, and then break when an integrity check fails.

In streaming mode, this is how we would expect it to operation. A user program can however offer some options in this case. Eg., do an integrity check pass before hand as a separate option; and turn the integrity checks into warnings, keep decrypting the data, knowing that there is garble in there, keep streaming. Both two useful options a program could offer.

So I'd say NO - streaming is streaming, and there isn't a requirement in the spec to be sure about the entire file before hand. That's just a quirk of the streaming mode that users will have to accept.


2. Random-access: Once a potentially-huge OpenPGP-encrypted file has
been written to some random-access-capable medium, allow a reader to
decrypt and integrity-check parts of that encrypted file without
(re-)processing the whole thing: i.e., support integrity-protected
random-access reads.

Let’s call these goals #1 and #2, respectively.
...
We could very well design an OpenPGP format that addresses both goals
together, if we decide both goals are valuable. ...

There are some obvious tradeoffs here, both in storage and complexity
costs.  I’m not that worried about the storage efficiency costs,...
  And the implementation-complexity is certainly an issue regardless.


Nod. Let's see how the requirements go first, and whether there is a reasonable design possible second.

So some questions about this:

1. How important is the ability to achieve goal #1 above in the OpenPGP
format (streaming-mode integrity-checking)?


It's certainly important. If we want to bring everyone across to a new format, and start ditching the old (from the standard) then we have to provide an equivalent to common use cases.

I'm inclined to say that stream-mode must be integrity checked. We want to achieve the same standard across the board, we don't want to say "if X, then Y, but if the Z, then not Y and maybe W..." and complicate the user understanding.


2. How important is the ability to achieve goal #2 above in the OpenPGP
format (random-access integrity-checking)?


Random access is a new feature. It's certainly an *attractive* feature for the inner geek, just because. But I am not seeing a clear use case as yet, at the user level. If I think about the command line, I can't see a way a user would say "decrypt from blocks 1234 to 8960" without getting into some arcane geeky construction like doing dd(1) or somesuch ... which no sane end-user does.

What I am seeing is that this would be an API call to other systems which do know what they want. This would be quite useful for a backup for example, or an rsync-like tool. Being able to re-start the backup is incredibly useful, being able to set off a backup to do a sort of "rsync" phased copy from "state N" without phase errors would be fantastic.

We would be then entering into the library space rather than the end-user interface space. This might actually be a good thing, it might tear our childlike grip from the command line and drag us into the new millenium in time for the next decade. It might finally kill off our obsession with email :)

Or it could be mission creep, scope enlargement, or the sinking of the project if we become all things to all other projects building GUIs on top?


3. For whichever goal(s) we wish to be able to achieve, should those be
*mandatory* or *optional* in the format?


I'd really like to see one format. The boolean logic that goes with different formats just ripples through the users minds and creates confusions. Every confusion creates loss of users. Every user we lose to confusion is a breach of security because they go on to do it cleartext or some other inadequate tool. If we have 10 such confusions scattered across the code, we'll probably half the number of users.

That's without even talking about bugs, and security snafus and the potential for choosing the wrong mode and breaking the lot... E.g., it took me 2 years to find out the reason why SVN would break every month was that the client side was mounted on a Mac OSX drive that had an *option* to select case insensitivity... dozens of mandays lost in rectification/recovery/rebuilding client repos because of an obscure option.

There is a reason the MiB run around and insert multiple-mode madness into people's minds in groups. It makes security brittle. It makes it easy for them to futz.


That is, should *every*
OpenPGPv5-encrypted file satisfy either or both of these goals, or
should they be configurable or user-selectable (such that some encrypted
files might contain per-chunk signatures and/or Merkle trees while
others do not)?  Making either of these goals “supported but optional”
might help mitigate any performance/storage cost concerns with either of
them, but would only further increase the complexity of the overall
OpenPGP spec and increase the “usability risk” of a user accidentally
failing to enable a relevant option when he really should have (e.g.,
streaming-mode protection for backups).


Yup. And then he goes off an uses another tool. Coz the sales force have realised that taking options away makes the sale easier, and the user can't see the schlock under the hood anyway.


4. What are reasonable upper- and lower-bounds for chunk sizes, and what
are the considerations behind them?


Defer to later.



iang

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp