Re: [openpgp] review of the SOP draft

Hi Antoine--

Thanks for the thoughtful review.  It was super long though!  I've
opened a bunch of issues from the stuff you raised here, but more
dicsussion follows inline below.

On Mon 2019-11-11 12:57:51 -0500, Antoine Beaupré wrote:

https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9


I've reviewed and merged the changes requested there, definitely useful
to have this clarifying editorial work done as patches, thanks!

This separation should make it easier to provide interoperability testing 
for the object security work, and to allow implementations to consume and 
produce new cryptographic primitives as needed.

 
I don't understand the part after ", and allow..." what does that
actually mean? What do we mean by "new cryptographic primitives" here
exactly?


The intent is that if every OpenPGP implementation provides a `sop`
interface (or potentially a superset of it), then we can write
interoperabiity tests and the like that drive all of them in a reliable
way.

We can, for example, generate new OpenPGP objects that incorporate new
primitives, and feed them to a stable of `sop` implementations, to
determine whether those implementations can consume them.

Or, we can drive them with simple inputs, and see which cryptographic
primitives they choose to use as they produce output.

If you think this text is clearer, i can incorporate it directly in the
draft.

[...]

Obviously, the user will need to manage their secret keys (and their
peers' certificates) somehow, but the goal of this interface is to
separate out that task from the task of interacting with OpenPGP
messages.

 
It's unclear to me how or if the SOP specification takes into account
the current design of GnuPG, specifically the part where secrets are
handled by a separate process, gpg-agent, which is designed to be
separate from the other parts of gnupg. From what I understand, the
"agents" keep the secrets and do the operations on behalf of other parts
of gnupg. But here sop would do so. Are we designing an agent here?

What about OpenPGP cards like the Yubikey? How does sop interoperate
with those?


sop is not GnuPG, and it doesn't claim or intend to be.

An implementation of sop *may* choose to support other forms of secret
key access (the "@FOO:" namespace is carved out to allow for that), but
the goal of sop is that it is simple enough that every implementation
that can handle OpenPGP secret keys and certificates should be capable
of implementing the interface.

I find those examples confusing. Multiple arguments, in particular,
seems ambiguous. Is it "CERT DATA"? or "CERT DATA"?


???  i think those are the same thing, but i'll just assume you meant
"DATA CERTS" at the end.  The answer is that there must be exactly one
SIGNATURE object to verify and there may be multiple certs, so the only
possible way to do it is SIGNATURE first, then CERTS.

But that doesn't address your larger point:

There should be *mandatory* commandline *options* instead, that clearly
state the purpose of (say) the "CERT" argument. It's a common in APIs,
to rely on the order of arguments for meaning, and I think it's often a
mistake. We should explicitely use *options* instead of *arguments* (as
in `--foo=bar` instead of just `bar`) for critical parameters like
secret or verification keys.


I would welcome a MR that makes this change across the entire spec, to
see whether it looks reasonable.  If people on this list prefer that
structure, i would be fine with adopting it, though i think it makes
the CLI more verbose than i would like.

I've trimmed out a lot of your comments below that were to do with
whether positional arguments are sensible or not.  that's not because i
don't care, but i don't think arguing about it in this one message is
helpful.

i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/7 to
record the overall concern.  if you decide to make an MR for it, please
reference that issue in the MR!

In general, I feel using the numeric error codes in the document make it
(needlessly?) harder to read. When i got to this section, my first
reaction was: "69?? why 69? and why 37? where the heck do those come
from and why do they matter?" We should at least include a reference to
the "Failure modes section" in the Introduction section. In Terminology
maybe? And maybe refer to it here.

In general, I'm worried there might be inconsistencies between the table
in the "Failure modes" section and the various hardcoded integers
peppered through the document. This practice also makes the document
more difficult to review and maintain in the future. We might instead
use constant names like `SUCCESS`, `NO_GOOD_SIG` that *then* have
integer values in the later section. This could also provide for good
constants to use in a library implementation.


I really like this idea, and i encourage someone™ who also likes it to
make a MR that implements it.

i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/1 to
keep track of it.

For all commands that have an `--armor|--no-armor` option, it defaults
to `--armor`, meaning that any output OpenPGP material should be
ASCII-armored (section 6 of {{I-D.ietf-openpgp-rfc4880bis}})
by default.

 
Is this on input or output? or both? It's clarified later, I
think, but it should be made explicit here as well.


the text you've quoted says "any output OpenPGP material".  I'm not sure
how to make that more visible, but i welcome proposed edits.

How do we generate purpose-specific subkeys?


With `sop`, you do not ;)

If you want to do fancy OpenPGP certificate generation, you do that with
your toolkit's own fancy features.

I've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/2 to
track that maybe we do want some rough guidance about what kinds of
secret key capabilities we want any `sop` to be able to generate here
though.

sign: Create a Detached Signature {#sign}

[…]

If `--as=text` and the input `DATA` is
not valid `UTF-8`, `sop sign` fails with a return code of 53.


Why do we mandate UTF-8 here? Explain.


We don't mandate UTF-8 unless the signer claims that the thing being
signed is text.  If so, it really does need to be UTF-8.  I have no
patience for non-UTF-8-encoded text in 2019.

OpenPGP embeds UTF-8 explicitly in its User ID formatting.  Any OpenPGP
implementation must already handle UTF-8.

if anyone thinks that dealing with different character encodings is a
good idea, please consider that the character encoding is not recorded
in the signature itself, leading charset-switching attacks like those in
https://dkg.fifthhorseman.net/notes/inline-pgp-harmful/

Do you think this information belongs in this document?

In general, I find the `--as` arguments to be a little confusing and I
don't undrestand what they bring to the table.


OpenPGP has two different forms of cryptographic signature, which are
dealt with by recipients differently.  The textual form canonicalizes
line-endings and the binary does not.  When signing a document, you need
to indicate to the thing doing the signing which form you are expecting
to create.

If you have a different suggestion, i'm happy to hear it.

Example:

    $ sop sign --as=text alice.sec < message.txt >message.txt.asc
    $ head -n1 < message.txt.asc
    -----BEGIN PGP SIGNATURE-----
    $


Another good example of the "argument vs option" problem. If I would see
a `sop sign` command, the first thing I would try would be:

    sop sign document

and expect it to find the right private key to sign the document
with. Of course, we don't do this in sop, which is fine, but I'll note
that we allow implementations to do so.


We deliberately *do not* allow implementations to do so.  sop requires
that you indicate which secret keys you want to sign with.  it is one or
more KEY objects, not zero or more.

If you think that `sop` is supposed to sign with other objects, then
i've done a bad job at drafting this proposal.  I've opened
https://gitlab.com/dkg/openpgp-stateless-cli/issues/5 to try to clarify this.

By forcing the arguments here to be the signing key, we make it
difficult to let the implementation pick the right key.


The entire point of `sop` is that `sop` *does not know* anything about
the keys.  it is stateless, and you have to tell it explicitly which key
to use.

If `--as` is set to either `text` or `mime`, then `--sign-with`
will sign as a canonical text document.  In this case, if the input
`DATA` is not valid `UTF-8`, `sop encrypt` fails with a return code of
53.

 
What is `mime` here? Why is it necessary? Expand.


i've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/6 to
track this.

`--session-key-out` can be used to learn the session key on
successful decryption.


"learn"? What does that mean? It seems it means "write to a file". 
If so that should be said explicitely here.


*What* the user of sop does is that they learn of the session key that
was used for the encrypted message.  *How* they learn it is by receiving
it from the program, either via the filesystem, or via @FD:NNN.

If you think that's unclear, i'd welcome a clarifying patch.

If `sop decrypt` fails for any reason and the identified `--session-key-out`
file already exists in the filesystem, the file will be unlinked.

 
This seems dangerous! Why do we delete a file we haven't created?
Explain.


We don't want the user to run `sop`, and then inspect a file that was
already in the filesystem thinking that it is `sop`s output.  If you
think that's a bad decision, please suggest what we should do
differently.

[`--with-session-key`] enables decryption of the `CIPHERTEXT` using the 
session key directly against the `SEIPD` packet.
This option can be used multiple times if several possible session keys 
should be tried.


What happens if both "in" and "out" are provided? I can venture a guess,
but it would be important to make that explicit as there can be horrible
bugs there.


Please do venture a guess, in the form of proposed text! I'd also love
to hear what the horrible bugs are.  I don't see them.

`--with-password` enables decryption based on any `SKESK` packets in the 
`CIPHERTEXT`.
This option can be used multiple times if the user wants to try more than 
one password.

 
We should include SKESK in terminology, because it's the first time we
encounter it here and I have close to no idea what it means.


https://gitlab.com/dkg/openpgp-stateless-cli/issues/12

I'm not sure we should put it in "terminology", but we surely should
document it better the first time it appears (it is documented the
second time it appears in the text.

If `sop decrypt` tries and fails to use a supplied `PASSWORD`, and it
observes that there is trailing `UTF-8` whitespace at the end of the
`PASSWORD`, it will retry with the trailing whitespace stripped.

 
Explain why we do magic things with whitespace. Consider not doing magic
at all as magic can be evil.


I expect the following use case to be common:

    echo correct horse battery staple > password.txt
    sop decrypt --with-password=password.txt < ciphertext > cleartext

If we don't strip whitespace, it will fail on one side or the other.

sop tries to define what sensible magic should be here, and i think i've
gotten it right.

as for magic being evil: if we don't try to do magic, then we will be
failing in ways that users don't understand, and implementers will get
bug reports that they respond to with "have you tried trimming trailing
whitespace and then trying again?"  that kind of round trip is pointless
when the tool could just avoid the problem in the first place by being
reasonable about what kinds of things people tend to do.

    https://gitlab.com/dkg/openpgp-stateless-cli/issues/13

`--verify-out` produces signature verification status to the
designated file.

`sop decrypt` does not fail (that is, the return code is not modified)
based on the results of signature verification.  The caller MUST check
the returned `VERIFICATIONS` to confirm signature status.  An empty
`VERIFICATIONS` output indicates that no valid signatures were found.
If `sop decrypt` itself fails for any reason, and the identified
`VERIFICATIONS` file already exists in the filesystem, the file will
be unlinked.

`--verify-with` identifies a set of certificates whose signatures would be
acceptable for signatures over this message.


Not failing explicitely on verification seems very dangerous. It relies
on callers properly reading the spec and realizing this is the only
exception where exit codes don't suffice in providing a general state of
the program. I would strongly recommend failing here, just like regular
verify.


if the person invokes `sop decrypt` with no arguments, should it fail?

if the person invokes `sop decrypt` with `--verify-out` and
`--verify-with`, should `sop decrypt` produce any output?

I'm trying to imagine using this in a MUA.  In my MUA i have what i
think are the keys for my peer, and i see a message from them.

I don't yet know whether it's signed.

I want to decrypt the message to read it, and in the process, i want to
find out whether it has been signed.  I'd prefer to avoid two passes in
this common use case.

If i supply the --verify-* arguments, and sop fails, then i don't get
the cleartext of the message (not in any reliable way, anyhow).  If i
don't supply the --verify-* arguments, and sop succeeds, i've lost any
signature data.

The subcommand is "decrypt", so `sop` treats successful decryption as a
success.  If you want to understand some additional state, you have to
inspect that state somewhere else, and that's in the contents of
`--verify-out`.

does that make sense?  do you think any of that explanation belongs in
the document itself?

As an aside, why can't we compose verify and decrypt here and just keep
"verify" out of "decrypt" altogether? I would guess that's (a
limitation?) part of the OpenPGP standard, but maybe it would be nice to
explicitely expand on this here as well.


Do you want to propose a way to compose them?  I suppose we could have
`sop decrypt` offer a `--signatures-to` argument, which the sender could
then use for `sop verify`, if anything comes out.  That's kind of an
interesting suggestion, and it might reduce the overall complexity of
`sop`.

However:

 * It means that a caller would need to handle the data twice (a minor
   inefficiency)

 * the signatures might not be verifiable if the thing that is signed is
   not a simple literal data packet, but is, say, a compressed data
   packet wrapped around a literal data packet.

So i don't see how to do this safely (or efficiently, but the
verifiability of the signature is more important than the efficiency
argument)

If the caller is interested in signature verification, both
`--verify-out` and at least one `--verify-with` must be supplied.  If
only one of these arguments is supplied, `sop decrypt` fails with a
return code of 23.

 
Another argument for failing on bad signatures: if we fail on bad
arguments of --verify, why don't we fail on bad signatures?


failing on bad arguments is "you've asked an ill-formed question".
That's legitimate and useful to let the operator know that things are
not what they seem.

Failing on a bad signature would be "the answer is no".

If the primary operation you're asking for is verification, then failing
on bad signatures is a reasonable outcome -- only succeed if you succeed
in verifying.

But if the primary operation is decryption, i don't think we should fail
on signature validity for reasons outlined above.

armor: armor: Add ASCII Armor
-----------------------------


[...]

If the incoming data is already armored, and the `--allow-nested` flag
is not specified, the data MUST be output with no modifications.
Data is considered ASCII armored iff the first 14 bytes are exactly
`-----BEGIN PGP`. This operation is thus idempotent by default.

 
Explain why we want idempotent and why we want to do this guessing game.


I'm at a bit of a loss here, because these seem obvious to me.

We want a guessing game because users who don't know what they want are
going to guess anyway, and they're likely to guess wrong.  We might as
well guess on their behalf if they don't know.

We want idempotent because "ensure this thing is armored" is a
reasonable semantic request -- indeed, it's probably the *only*
reasonable semantic request.  Perhaps we could drop --allow-nested?

This @ENV: and @FD: stuff really makes me uncomfortable. It's a neat
hack for commandline applications, but it would break down when
designing a library API, as the "type" of data passed around the API
would be ambiguous, or at least with possible side effects. That feels
like "design smell" here and I would like this to be changed.


FWIW, the @ENV: and @FD: conventions are specifically CLI conventions
and i don't expect them to be translated to any programmatic API.

for example, in the pythonic variant of this framework that i'm working
on, i've treated these objects as labeled bytestreams. ("labeled"
meaning -- if you get a set of these objects, each one has a bytes-like
thing, and a textual name that you can use in error reporting)

I admit that i'm struggling a bit with whether i want to pass around
bytes objects directly (simple, what i've got now) or some sort of
asyncio handle that the bytes are readable from (fancier, probably nicer
to use in the kinds of programs that i will eventually want to write
with this). https://gitlab.com/dkg/python-sop/issues/1 But that has
nothing to do with @ENV and @FD, as those aren't exposed to the python
functions at all.

I would recommend using equivalent environment variables to the
parameters instead, for example SIGN_WITH for --sign-with and so
on. This would, of course, require switching positional arguments to
options but I already explain why that would be a good idea anyways
earlier.


environment variables won't work for arguments that can be supplied
multiple times, because then we have to invent a new delimiting scheme.
I definitely don't want to do that.

File descriptors could be passable as distinct options, like
--sign-with-fd for --sign-with.


This is an interesting proposal, though i don't see how --sign-with=@FD:3
is much different from --sign-with-fd=3  -- i guess it lets you use
files that are literally named @FD:3 ?  Is that important?

If you could open a merge request that proposes this change i'd be happy
to consider it.

I've opened https://gitlab.com/dkg/openpgp-stateless-cli/issues/14 to
keep track of it.

I've dealt with commandline applications that have special meanings with
@, and in retrospect, it was a bad idea. In particular, Python's
argparse module supports using a prefix argument to mean "read options
from this file" and I've used it to implement crude configuration file
support for monkeysign and other programs. It's confusing for users and
does not work very well.


afaict, this is not mandatory in argparse, and i wouldn't recommend it
for any sop implementation:
https://docs.python.org/3/library/argparse.html#fromfile-prefix-chars

I think the way it's specified in `sop` right now is pretty pincipled
and hard to screw up, but i could be wrong.

Specifically in this case, I would also worry about security
vulnerabilities with untrusted filenames being passed to the program.


can you explain this more?

CERTS {#certs}
-----

One or more OpenPGP certificates (section 11.1 of 
{{I-D.ietf-openpgp-rfc4880bis}}), aka "Transferable Public Key".
May be armored.

Although some existing workflows may prefer to use one `CERTS` object with 
multiple certificates in it (a "keyring"), supplying exactly one certificate 
per `CERTS` input will make error reporting clearer and easier.


This last bit is in contradiction with `extract-cert` command
documentation which says it will "only contain one cert". Maybe we
should just pick one and stick with it here?


I have carefully considered this, and i do not think these are in
contradiction.

extract-cert explicitly says it will contain only one cert *because* the
other places where `CERTS` might be supplied could contain more than
one.

The fact is that people use "keyrings" today, including some that
contain hundreds or thousands of keys.  If a distro, for example, wants
to use `sop` to verify that a package is signed by one of their
developers, i don't want the distro to need to put each developer's key
in a separate file.

That last part doesn't *look* like "arbitrary text" to me. It looks like
some explanatory message of the operation. If that's the case, we should
make that explicit and say why the text is present at all. Calling it a
"note" or "message" would already be an improvement.


patches welcome, particularly for this kind of editorial cleanup :)

A `sop` implementation MAY return other error codes than those listed
above.

 
This sounds like a bad idea. I interpret that as meaning that I can
return an error code 2 instead of error code 3 if i fancy. If we're
going to pick numbers, we should either enforce them or not, but don't
dance around the issue and encourage people to diverge from the spec.

Or at least, if you allow divergence, explain why it can be allowed.


I've documented this concern as
https://gitlab.com/dkg/openpgp-stateless-cli/issues/10

It would also be great if we could explain where those magic numbers
come from in the first place. I suspect they were chosen to not overlap
with existing error codes, but that's just a guess.


Justus picked 69 in his OpenPGP Interoperability Test Suite.  I chose
the others as "reasonable-sized primes" just for fun.  I don't think
this information belongs in this document, as it doesn't matter.

Detached Signatures {#detached-signatures}
-------------------

`sop` deals with detached signatures as the baseline form of OpenPGP 
signatures.

The main problem this avoids is the trickiness of handling a signature that 
is mixed inline into the data that it is signing.


Should we expand on "trickiness" here?
 
Also: how *do* we deal with inline signatures? Are those deprecated now?


From my POV, yes, inline signatures are recipes for disaster, because
the difficulty of figuring out what specifically was signed when you
have an inline signature is not at all obvious.  I've noted that this
needs to be reflected in the spec at:

  https://gitlab.com/dkg/openpgp-stateless-cli/issues/9

That said, i've heard from some of the folks handling package managers
who have specific reasons for wanting inline signatures (also, Neal
Walfield appears to think they're useful in his
https://gitlab.com/sequoia-pgp/pgpcat).  so we probably need to tackle
them somehow here.  (it's even in "future work"!)  I've recorded that
concern here:

   https://gitlab.com/dkg/openpgp-stateless-cli/issues/11

FIXME: if an encrypted OpenPGP message arrives without metadata, it is 
difficult to know which signers to consider when decrypting.
How do we do this efficiently without invoking `sop decrypt` twice, once 
without `--verify-*` and again with the expected identity material?

 
Maybe we could use a "sop probe" command for this and other things?


I don't understand what you think a "sop probe" command would do.  if
you'd like to propose it as an MR, i'd consider it, though.

Compression {#compression}

[…]

How about decryption? Do we attempt decompression during decrypt?


It will be interesting to see what implementers do!  I've left `sop`
deliberately agnostic there, and i would like to learn from test suites
what the answer is.

     --dkg

signature.asc
Description: PGP signature

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp