ietf-openpgp
[Top] [All Lists]

Re: [openpgp] AEAD Chunk Size

2019-04-16 19:06:16
Hi Jon, 


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, April 15, 2019 5:00 PM, Jon Callas <joncallas(_at_)icloud(_dot_)com> 
wrote:





On Mar 30, 2019, at 9:11 PM, Bart Butler 
bartbutler=40protonmail(_dot_)com(_at_)dmarc(_dot_)ietf(_dot_)org wrote:


[...]


OpenPGP is in general the latter case rather than the former. I believe 
it’s less important to have strict semantics on failures because it’s 
usually storage.


I agree. I would say my point is that with sufficiently small chunks, the 
user/decrypter can choose what kind of failure behavior is appropriate.. 
Large chunks robs the decrypter of that.


We are mostly in violent agreement, I do believe. I feel like I'm saying 
something like "a quarter is a coin with George Washington on one side and an 
eagle on the other" and you're saying "a quarter is a coin with an eagle on 
one side and George Washington on the other." We're talking about the same 
coin, with a slightly different point of view.


I wouldn't use a term like "rob" because that assigns value to the condition. 
I think there are places where rejection matters and is a Good Thing.. I 
think there places where it is not a good thing and is even a Bad Thing. 
That's why I was using terms like "strict semantics" and a lot of 
conditionals.



I said 'rob' because I think fundamentally that the release semantics should be 
something that is decided by the decrypter, not the encrypter, as only the 
decrypter knows what kind of release semantics are safe or not. For example, I 
have a 32 MB PGP/MIME message. I want to show a preview in my email client. If 
we use 8K chunks, I can read the first chunk, know that it hasn't been messed 
with, and display it safely. If the spec allows a 32 MB chunk, as an 
application developer I have some choices:

1. I can load the entire 32 MB and be really slow/bandwidth intensive
2. I can not show a preview for this message
3. I can ignore release semantics and do it anyway, risking the Problem That 
Shall Not Be Named

All of the options are terrible for a UX perspective. Meanwhile, if the chunk 
size is capped, this makes it easy, and if I, as an application developer, need 
strict release semantics for the entire file/message, I can do that too.

Now, with your proposal, the other implementations and I can come to some 
agreement that hey, we just aren't going to allow chunks meaningfully higher 
than the cap, what you call "normative" agreement. That's fine, but I'm worried 
that these norms don't tend to be well-documented (I'm not sure the MAY in the 
RFC will be sufficient), and someone somewhere is going to write an 
implementation at some point which exclusively uses big chunks. When they do, 
our implementations will reject them, and then their users will complain to the 
app developers, who will in turn complain to the implementers.

I'm certainly not so arrogant to assume I can anticipate all future needs here. 
But I think it's telling that we can come up with several negative consequences 
of allowing the large chunks and the only benefit is something that can be 
achieved at the application layer (or as an option at the implementation layer 
even) if desired anyway.

I also think that forcing no-release semantics via packet structure is 
misguided because app developers/implementors are likely to ignore if it 
becomes too annoying. That is, I anticipate some implementers just allowing 
some kind of unsafe mode that releases plaintext early with no integrity checks 
if this comes up (essentially streaming not along AEAD chunk boundaries), and 
I'm in general uncomfortable with choosing to build a feature whose failure 
case is "massive security hole", not to mention one that we've seen before with 
the Problem That Shall Not Be Named. Do we want to allow people to create 
messages which *cannot* be safely streamed when we have the choice not to do 
this with zero functional downside? Strict release semantics can always be 
enforced at the implementation or application level.

I don't want to bury my lede any deeper than this. What I'm saying is:


-   The more you want strict AEAD semantics of no-release, the fewer chunks 
you want.
-   It seems to me that the people who most believe in strict AEAD release 
are also the ones who are arguing for smaller packets. These seem to be in 
opposition to each other. I've been confused through this discussion because 
the rationales seem in opposition and confused. I don't get it, and I want to 
understand; you all are smart people whom I respect, so if I'm confused, 
maybe I'm not getting something.

This feeling is completely mutual. I respect everyone in this discussion and 
know that all of you are smart people. I will try to rephrase what I think is 
the fundamental question here, and it's not what release semantics should 
be--those can be enforced in lots of places, and as you said, vary by use case, 
which is very compatible with my views. I think the fundamental question here 
is this: 


*Should we allow creation of valid messages which cannot be streamed and 
attempt to force strict no-release at the protocol layer?*

I think, in the absence of a compelling reason to, the answer to this is a 
pretty clear no.

    

    We might differ in that I have a nuanced opinion about AEAD rejection. I 
think that there are places where it matters, and places where you don't. For 
example, in networking, particularly the parts of the network stack where you 
can easily get a forged packet. You want to reject that packet as early as 
possible. Moreover, these places are always using very small packets. (I'm 
going to wave my hand and say that under a megabyte is "very small" for these 
purposes.)
    

    But in archival storage, you don't want to reject something because 
there's a media error, you want to recover as much as possible. You might 
even be required to do so by law. I have real-world anecdotes if you want to 
hear them.
    

    On a network, rejection is a good thing. You reply a NAK to the sender 
and they retransmit. In archival storage, there's no retransmitting on a 
media error. That's the case where it's a Bad Thing, and in fact, it might 
even be better to use CFB mode and an MDC than AEAD. It also might not, and 
much depends on which AEAD mode one used.
    

    Nonetheless, if you believe in strict semantics, you also likely want the 
fewest number of chunks. If there is more than one chunk, you have to stage 
the output, you have to process everything (unless you're going to say that 
the timing side-channel is not important)

Why do you have to stage the output in the multi-chunk case? The only 
difference in the multi-chunk case is that I'd check AEAD tags multiple times 
instead of just at the end. There's no reason why I'd have to do anything with 
the output differently than a single chunk if I embrace strict no-release. I 
could buffer it the exact same way I was buffering the single chunk and the 
application/consumer doesn't have to know there is any difference.

Fundamentally, multi-chunk just gives you options. There is nothing stopping an 
implementation from doing strict no-release. 


    

    Sometimes this is not possible. Ironically, the place where it's most 
possible is in storage, where it's the least needed. In online protocols,
    



OK, I think this is the part that I don't understand. Why does it matter 
what chunking scheme is used here? If my app requires all-or-nothing 
semantics, I would program my app to enforce that all chunks must pass and 
not release plaintext unless that happened, with no truncation, etc. So why 
would every joint be a vulnerability?


What value does large-chunk AEAD actually provide? What I'm getting 
from the AEAD Conundrum message is that it's a way for the message 
encrypter to leverage the "don't release unauthenticated chunks" 
prohibition to force the decrypter to decrypt the whole message before 
releasing anything. Why do we want to give the message creator this 
kind of power? Why should the message creator be given the choice to 
force her recipient to either decrypt the entire message before release 
or be less safe than she would have been with smaller chunks?


Let me summarize the conundrum: If you want strict AEAD no-release 
semantics, you want a fewer number of chunks.


I guess this is my fundamental question. You can force no-release semantics 
at the application level for any chunk size scheme, right?


Yes, you can, provided that there's a way to report that back, and your 
caller checks the return value.

You (as an implementation) could just not return the plaintext until the entire 
message was read. There's nothing stopping implementations from having a strict 
no-release mode.



I suppose this really means no, you can't force it, because the library 
writer can't force the application code to check the error return.

Well, the library can always just not return the plaintext if we don't think 
it's safe. I just don't think it's the encrypter's business to be deciding what 
is safe or not for the decrypter.



I have heard that some issues that we're Not Going To Talk About had among 
the issues improper checking GnuPG's report of an MDC failure was an issue in 
at least one place.



Sure, but this could have been configured as a hard failure. The apps didn't 
configure it as a hard failure because that would have collided with 
UX/application concerns, and I fear that that collision will occur again if we 
allow it to, with likely the same result.

If you respond to a security request with a performance answer, you 
literally don’t know what you’re talking about. So let’s toss that aside.


I apologize, I was not trying to create a strawman here, but I am 
completely at a loss for what the benefit of large chunks is.


From a standpoint of debate technique, coming up with a strawman makes your 
whole side of it weaker because attacking a strawman is attacking a strawman. 
It makes it look like you don't understand, when you actually have a 
different issue. I think it has added to the confusion I have been suffering 
from. The chunk size question is about adjusting security parameters, and 
thus when you say, "it won't help performance" I can't help but think that 
we're not discussing the same thing at all, as I'm talking security, and 
you're talking performance.


Good to put that to bed. Back to the chunk size debate.


I don't know the specific benefits, either. I heard people asking for it, and 
I'm defending the idea for them.


I believe that an underlying difference between your thinking and mine is 
that you're looking at this as an application writer, and I'm looking at it 
like a protocol / API that has many clients, some of whom (and the largest 
ones) aren't written yet.


Moreover, there are a lot of people who use OpenPGP for a lot of things that 
we don't know about. As Peter Gutmann pointed out there are a lot of EDI 
systems, back ends of financial systems, and so on that internally use 
OpenPGP implementations. They're not here. I'm trying to watch out for them.


There are also people around who want to do something and for a lot of 
reasons find it difficult to speak up. I'm not editor any more, Werner is and 
I have every faith in him. Sometimes, though, old habits die hard.



I'm sympathetic to all of this, and I don't want to put anyone on the spot.. It 
would be really great if anyone who has a use case for large chunks speaks up 
though, either through this thread or privately to me, Jon, or anyone else they 
feel comfortable speaking with, because I do not want anyone's voice to not be 
heard, and if there is a use case for large chunks I do want to hear about it 
before this decision is finalized.

I tend to see the AEAD packet format as being a successor to the existing 
streaming, indefinite length things. That allows chunking up to 2^30 and 
while absurdly large, it has never been an issue.



Well, except that streaming this old stuff is unsafe if ciphertext modification 
is a threat.

In my head, I think why not allow up to that, since it would preserve 
anyone's weird thing?


On the other side, implementers need guidance. Today, the guidance is 
folklore with all the issues that go with it. It's better not to have 
folklore. But, if we basically said, "do what you're doing today" then we'd 
be looking at 8K chunks, as that's what GnuPG does today.


The clauses I suggested about MAY support larger / MAY give larger the finger 
seemed to be a compromise that would work because it gives you the guidance 
you need; it lets whoever these people are the ability to do what they want; 
and lastly should there be a consensus that it needs to be larger in the real 
world, a consensus of implementers can change it without a new document. It 
seemed to me that everyone wins.



For the record, I'm pretty much OK with this, I just think it's opening us up 
to future problems that it would be best to avoid.

Yet I thought I perceived that you not only wanted to win, but you wanted to 
salt the earth in the other people's territory. Fixing an upper bound on 
memory has a long history of Famous Last Words going back to the old clichéd 
"640K is more than enough for anyone." The gods punish hubris.

I'm sorry I gave that impression or was overly strident. I consider this a rare 
opportunity to fix something before it becomes a problem rather than afterward 
with a bunch of legacy baggage in tow. I have no interest in "winning" this 
argument for it's own sake--I would be happy to get a counter-argument for 
large chunks that made me think "yes, there is a use case and that's why we 
want to risk having these future problems".



Okay -- let's sort all this out. I really think we are ALMOST done here.


Here's what I stated before.


(1) MUST support up to <small-chunk-size> chunks.
(2) SHOULD support up to <larger-chunk-size> chunks, as these are common..
(3) MAY support larger up to the present very large size.
(4) MAY reject or error out on chunks larger than <small-chunk-size>, but 
repeating ourselves, SHOULD support <larger-chunk-size>.


Clauses (3) and (4) set up a sandbox for the people who want very large 
chunks. They can do whatever they want, and the rest of us can ignore 
them.. Why get rid of that? It doesn’t add any complexity to the code. It 
lets the people who want the huge ones do them in their own environment 
and not bother other people.


My concern is over (1) and (2) and specifically that there’s both <small> 
and <large> sizes.


I think that’s an issue. If there are two numbers we are apt to end up 
with skew before settling on one, so it’s better to agree on just one. 
That’s the real wart in my proposal.


I'm OK with eliminating (2) and just using the MAY part to take care of any 
legacy 256K messages OpenPGP.js users might have. As I said, we don't have 
any of these messages in production yet and I'd err on the side of a 
cleaner spec.


Me too. I think saying 256K is fine. I have an intuition it ought to be at 
least as large as the largest Jumbo Frame, and that's 9K so round to 16K. Let 
me restate the proposal.


(1) MUST support up to <chunk-size> chunks.
(3) MAY support larger up to the present very large size.
(4) MAY reject or error out on chunks larger than <chunk-size>


And it seems that 256K is the proposal for <chunk-size>. Are we agreed on all 
that?



As some respondents would like 8K or 16K, I'm fine with doing that instead of 
256K. I would like to check with the maintainers of our libraries to find out 
if there's any reason I'm ignoring that would favor one or the other before 
committing though.

I just really want to understand the benefit of large chunks for security 
and right now I clearly do not.


If you believe that no-release is a Good Thing, then you want fewer chunks, 
ideally only 1 chunk. That's it. That's the ONLY reason.


I think I discussed this to death above so I won't add to the word count here.

-Bart

I believe that no-release can be a Good Thing, but rarely is for OpenPGP's 
primary use case. As I said in my other missive, I don't think that it's even 
possible in the general case. Networking packets, yes -- both possible and 
desirable. Files, no -- neither possible nor desirable.


Jon

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp
<Prev in Thread] Current Thread [Next in Thread>