Re: [openpgp] Message padding in OpenPGP

On Sep 25, 2019, at 5:03 AM, Justus Winter 
<justuswinter(_at_)gmail(_dot_)com> wrote:

On Tue, Sep 24, 2019 at 11:00 PM Jon Callas <joncallas(_at_)icloud(_dot_)com> 
wrote:

Am I correct in understanding that you're proposing adding in decoy traffic 
to pad out compressed data to its uncompressed length?


No.  I'm proposing not to compress the data at all, and then add some
padding data according to some policy.  The compression container is
only a means to add the padding within the constraints of the current
ecosystem.


Okay. Thanks. I think that clarifies. (Meaning I'm still slightly puzzled, but 
okay.)

If I'm missing something, what problem are you trying to solve with this?


There is a correlation between the size of the encrypted message and
the size of the plaintext.  On first sight, compression helps with
that, but that makes the size dependent on the entropy of the
plaintext, which also leads to problems as discussed previously.
Padding alleviates this problem, the tradeoff being an increased
message size.


Well, if you don't compress, sure there is. The size is going to be 
<message-size> + <overhead> and the overhead is generally easily computed or 
guessed.

I understand the vague concern, but to me the proposal is security stone soup. 
You're throwing some stuff together and it vaguely meets the vague concern.

I think there's a way forward and that might be something like:

* Describe the actual threat. I can imagine threats, but I don't have a handle 
on what you're trying to do exactly.
* Describe how the solution (padding) helps and give quantification as to how 
it helps. There are plenty of places where padding doesn't help, it just shifts 
a bunch of things around. There are also places where padding ends up hurting. 
I've seen this in constant-traffic networks where the padding makes traffic 
analysis easier (hand waved explanation: the padding makes it easier to recover 
a timing side channel and that side channel allows you to statistically remove 
the padding; you end up knowing the aggregate of padding to a statistical 
confidence level, and on a data stream, that's good enough.)
* Look at what might be second-order effects and discuss them at the least. 
Costs in terms of networking and storage vs benefit need to be in here.

A few years ago, I was looking at a very similar problem, and that was removing 
sized-based traffic analysis from cloud storage systems. We looked at padding 
things out, and in many cases padding didn't really help. We looked at padding 
with thresholds -- e.g. round everything up to the nearest chunk size, where a 
chunk is something like a power of two in the 4K to 1M range. It turns out that 
a lot of information gets leaked anyway. For example, you can easily guess that 
something is likely to be a selfie, because they're all in a reasonably narrow 
band of sizes.

The larger your chunk is, the better you blur, but the obvious downside (that 
now you're writing a lot of extra data in the vast majority of cases) has such 
an effect on wasted networking and storage space that A Reasonable Person would 
likely decline to pad. Padding small enough for the concern to be insignificant 
gives a correspondingly insignificant benefit. We ended up doing some chunking, 
but we knew that the benefits were so small that we didn't even really talk 
about it. It was far too easy for someone to think they were getting a huge 
benefit that they weren't getting. (A similar situation is the way people 
overestimate how much private browsing or even Tor help them.)

Summing up, it's interesting, but I think a cost-benefit discussion should 
follow, along with at least a hand wave of metric-ish things.

        Jon
_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp