Re: [openpgp] AEAD Chunk Size

My apologies for being only an occasional participant in  this thread (and
it will likely take me another week before I can reply again), but there
are a few points I would like to make.


On Sat, Mar 30, 2019 at 02:17:55AM +0000, Bart Butler wrote:

Hi Jon,

As others have noted, there is a lot of confusion on this thread, some of 
which you touched in your AEAD Conundrum message, like when we say AEAD 
should not release unauthenticated plaintext, do we mean the entire message 
or the chunk?


It's really quite something to have gone through a week's worth all in one
go.  There are many people writing out careful descriptions of how they see
things, and yet we still seem to be talking past each other at times.

I propose that we use "plaintext corresponding  to non-modified ciphertex"
for the non-malleability protection that is provided by an AEAD
authentication tag on a single chunk, and "fully authenticated complete
plaintext" for the output after processing an entire message (i.e., all
chunks) with guarantee of non-truncation.  (Are there other cases in
between that we care about?)

Another piece of confusion is that Efail isn't a single vulnerability, it was 
several vulnerabilities related (at best) thematically.

So to be very specific, for the purpose of the following discussion, the 
advantage of smaller AEAD chunks is specifically to prevent Efail-style 
ciphertext malleability/gadget attacks, and the prohibition on releasing 
unauthenticated plaintext is applied to individual chunks, which is 
sufficient to foil this kind of attack in email.

The kind of attack we are talking about is fundamentally about exfiltration 
of plaintext data to an attacker-controlled endpoint. Borrowing from your 
AEAD Conundrum message, if the first chunk passes and is released, and the 
second chunk fails, that is OK, at least for email, because the part that was 
modified (the second chunk) is never released, so you get a truncated message 
and an error, but the truncated message without the modifications isn't going 
to exfiltrate itself.


One concern that I have (and  is only tangentially related to this quoted
part) is that I want to make it easy for implementations to "do the right
thing" when ciphertext is modified, i.e., return an error, and specifically
to return an error without releasing any plaintext that originates from the
modified ciphertext.  The current openpgp ecosystem does not seem to be
very compliant to that desired behavior, and part of that may be due to a
lack of philosophical support/help from the spec.

Now if releasing ANY authenticated chunk of a message that hasn't been fully 
authenticated (in an AEAD sense) is a real problem for your application, I'd 
argue that you're trying to make AEAD do something it's not suited for and 
you should enforce this in your application if it applies to you, probably by 
not streaming.

So to recap, small-chunk AEAD provides specific value in preventing 
ciphertext malleability/gadget attacks, particularly in HTML email, which is 
a common use case.

What value does large-chunk AEAD actually provide? What I'm getting from the 
AEAD Conundrum message is that it's a way for the message encrypter to 
leverage the "don't release unauthenticated chunks" prohibition to force the 
decrypter to decrypt the whole message before releasing anything. Why do we 
want to give the message creator this kind of power? Why should the message 
creator be given the choice to force her recipient to either decrypt the 
entire message before release or be less safe than she would have been with 
smaller chunks?

Coming back to Neal's point, it's really hard to see any sort of value in 
really large AEAD chunks, because the performance overhead is negligible at 
that point and the only security 'benefit' that I can see is the encrypter 
trying to use the spec to force the decrypter to not stream, which does not 
seem like something at all desirable.


I'm still not sure I understand the point of very large chunks, since once
they get really  big an implementation is choosing between streaming
plaintext from potentially modified ciphertext or return an error without
even attempting to process the chunk.  I'm not convinced that the second
will win out in implementations if  we alow very large chunks.

Some other notes, not relating to anything specifically quoted from this
message (but derived from other parts of the thread):

TLS allows for arbitrarily variable-length chunks because it is
a synchronous transport for higher-level application streams and the
application may have arbitrary message sizes.  OpenPGP is used in an
asynchronous model, where a message generator can be modelled to make all
its actions before the receiver processes anything, and there is only
one-directional communication within the OpenPGP format.  So there does not
seem to be much demand for "take all the bytes that you have so far and
send them right now", and AFAICT the message generator can just wait until
end of data arrives or enough data to make a complete chunk arrives.  So
from that point of  view, there is not much argument in favor of varying
the chunk size within a single message, and possibly even across messages
(i.e., this line  of reasoning would be okay with a single chunk size fixed
for everyone as a protocol constant).  There are of course other factors
that may come into play, like constrained systems and  such, but we can
treat those separately.

I also have a use case for authentication of large chunks of data at rest:
they allow me to use a cheap bulk storage service that provides
(best-effort) replication and archiving but has poor physical security.  So
I encrypt my data to myself and put it in storage, but when I get it  back
I need to know that it's valid.  I can imagine at least one case where
knowing exactly which chunk was corrupted would save effort; it may be a
toy example but perhaps it is illustrative of a broader case.  Note that
there are algorithms to compute pi to arbitrary precision, and even to
compute the Nth digit thereof without coputing the previous digits.  If I
need to have random-access inquiries into the value of pi, I could
precompute using softare I trust and do this self-encryption thing, and
when a chunk is bad I can recompute only that chunk and still trust that I
only ever use values generated by my trusted implementation.

And finally, there is no openpgp Working Group; all we have here is a bunch
of folks interested in a topic talking amongst each other on a public
mailing list hosted at the IETF.  There are no WG chairs and no expectation
of Area Director supervision (i.e., I don't feel obligated to read the
messages here).  That said, I'm happy to see that we're staying calm and
civil, and AFAICT everyone is honestly trying to understand everyone else's
position and come to a consensus.  Let's try to keep focusing on the
technical details and what use cases we need to cover.

Thanks,

Ben

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp