ietf-openpgp
[Top] [All Lists]

Re: [openpgp] [Cfrg] streamable AEAD construct for stored data?

2015-11-01 09:52:03
Den 30 okt 2015 11:01 skrev "Daniel Kahn Gillmor" 
<dkg(_at_)fifthhorseman(_dot_)net>:

Hi CFRG folks--

We're looking into fixing the OpenPGP symmetrically-encrypted data
formats for RFC4880bis.  The structures are used for mail messages but
also for large file encryption.  It's clear that the OpenPGP CFB mode
isn't designed to modern symmetric encryption standards, so we're hoping
to introduce a better approach.

We need, among other things to address integrity protection in a more
meaningful way than the current OpenPGP MDC (modification detection
code), which is basically a SHA-1 hash of the cleartext.  This was never
much better than a band-aid.  And as discussed in the recent "OpenPGP
SEIP downgrade attack" thread, an "integrity-protected" packet with an
MDC can be stripped down to produce a syntactically-valid packet without
integrity protection.

But one of our constraints is the OpenPGP use case that streams
decrypted data, like this:

[...]

This approach still has two notable problems i can see, which may or may
not be addressable (but if they are, i'd love to hear it):

 a) it doesn't deal with truncation -- the initially-streamed data has
    already been streamed by the time a truncation is discovered.
    (there may be no way to fix this; it seems kind of like a fact of
    nature, and if so, systems should only do streaming decryption if
    they're capable of coping with truncation)

Does it help to define the ciphertext length and check that first, before
decrypting? Doesn't help if the file isn't local and the connection is
broken, but then at least your software should detect that and halt of
necessary.

 b) it doesn't seem to compose as well with asymmetric signatures as one
    might like: a signature over the whole material can't itself be
    verified until one full pass through the data; and a signature over
    just the symmetric key would prove nothing, since anyone getting the
    symmetric key could forge an arbitrary valid, decryptable stream.
    Is there an intermediate approach that would combine an asymmetric
    signature with a chunkable authenticated encryption such that a
    decryptor could stream one pass and be certain of its origin (at
    least up until truncation, if (a) can't be resolved)?

Thoughts, pointers, or suggestions would be much appreciated.

To solve B what you need to do is something like signing a list of
ciphertext hashes/authentication tags.

One thought I've had before (my idea is to use it for FDE*) is to for
example use HMAC over segments (including counters) or to extract AEAD tags
(prefixed with counters to preserve order) and create a Merkle tree hash of
those lists when creating the message, to then sign that Merkle tree, such
that when you decrypt and recreate the tags for comparison you can confirm
that nothing has been modified (with the level of assurance that the tags
can provide). Or you skip the standard AEAD and MAC constructs and use a
signed Merkle tree hash of the ciphertext itself as your own custom MAC.

One benefit of using a hash-tree like algorithm with a signature is the
reduction of storage overhead and memory usage, and that you retain the
ability to independently verify each segment. Ideally you would use a
hash-tree like algorithm which also can be generated efficiently in a
single pass over even large ciphertexts with reasonable memory usage (does
anybody know of one?).

Not that I've also read the referenced Tahoe-LAFS link, it looks like
they're doing something very close to what I described above, but slightly
different:
They use AES-CTR over the whole file with a unique key, one plain hash over
the entire ciphertext, one Merkle tree hash over the ciphertext blocks and
one Merkle tree hash over the erasure coded shares of the blocks, if I'm
reading it correctly, with all three hashes stored in plaintext with the
shares and then hashed together (also including the length of the file).
They also have some sort of hash chain, but the graphics don't load and I
can't figure out how exactly it is applied, beyond potentially having to do
with confirming the order of blocks. Instead of using HMAC they use double
SHA256 in a particular format.

Minus the erasure coding** and with an added signature of the file
hashes/header, that's almost exactly what I imagined, I'm just worried
about the risk of the performance penalty limiting adoption. If the
performance of this method is considered acceptable or can be improved to
an acceptable level, I definitely support using it.

* A bit off topic here, but for FDE I imagine changing the encryption key
every write-session using a KDF and session counter, keeping an
authenticated encrypted list of which segments uses which session write
keys. This way you prevent partial ciphertext reversal and prevent
detection of when ciphertext segments repeat over time (re-zeroized or
restored files). You can also arbitrarily re-encrypt random segments to
obscure your real write patterns (and reduce the list size occasionally by
purging unused keys after re-encrypting the last segments using them).

** In Tahoe-LAFS, erasure coding of blocks is used to allow you to split
files across storage nodes with minimal risk of data loss. That's not
applicable here, as it can be applied independently when considered
necessary.
_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp