[openpgp] review of the SOP draft

Hi,

I've taken some time to do a a short review of dkg's "SOP" interface
specification, version 01, as provided here:

https://tools.ietf.org/html/draft-dkg-openpgp-stateless-cli-01

First, I want to say this is excellent and much-needed work. OpenPGP
interfaces have traditionnally been hard to use and a major obstacle to
adoption of strong cryptography in our communities. Having a standard
and *simple* way of interoperating with the underlying OpenPGP
primitives would go a *long* way in significantly improving OpenPGP
adoption and online privacy in general. So I salute this (first) effort
in improving this situation.

Second, I have proposed a series of changes to the document here:

https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9

Patch set:

https://gitlab.com/dkg/openpgp-stateless-cli/merge_requests/9.patch

Those are mostly small fixes that might not require a full discussion
here, although I of course welcome feedback on those here as well. It
might be preferable to comment on the issue directly if you have a
GitLab account, obviously.

Finally, there are some things about the document I wanted to comment on
and that I don't have an easy patch for, so I will comment on the
document, inline, here instead. I hope that is proper form, let me know
if there is a better way to do this.

Introduction
============


[...]

This separation should make it easier to provide interoperability testing for 
the object security work, and to allow implementations to consume and produce 
new cryptographic primitives as needed.

 
I don't understand the part after ", and allow..." what does that
actually mean? What do we mean by "new cryptographic primitives" here
exactly?

[...]

Obviously, the user will need to manage their secret keys (and their
peers' certificates) somehow, but the goal of this interface is to
separate out that task from the task of interacting with OpenPGP
messages.

 
It's unclear to me how or if the SOP specification takes into account
the current design of GnuPG, specifically the part where secrets are
handled by a separate process, gpg-agent, which is designed to be
separate from the other parts of gnupg. From what I understand, the
"agents" keep the secrets and do the operations on behalf of other parts
of gnupg. But here sop would do so. Are we designing an agent here?

What about OpenPGP cards like the Yubikey? How does sop interoperate
with those?

[...]

Examples
========

These examples show no error checking, but give a flavor of how `sop` might 
be used in practice from a shell.

The key and certificate files described in them (e.g. `alice.sec`) could be 
for example those found in {{I-D.draft-bre-openpgp-samples-00}}.

~~~
sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
sop extract-cert < alice.sec > alice.pgp

sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc
sop verify announcement.txt.asc alice.pgp < announcement.txt

sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
sop decrypt alice.sec < ciphertext.asc > cleartext.out
~~~


I find those examples confusing. Multiple arguments, in particular,
seems ambiguous. Is it "CERT DATA"? or "CERT DATA"?

There should be *mandatory* commandline *options* instead, that clearly
state the purpose of (say) the "CERT" argument. It's a common in APIs,
to rely on the order of arguments for meaning, and I think it's often a
mistake. We should explicitely use *options* instead of *arguments* (as
in `--foo=bar` instead of just `bar`) for critical parameters like
secret or verification keys.

Then "arguments" are left for more regular file parameters, the primary
purpose of the command (e.g. "sign, encrypt, decrypt this file", with
"verify" being of course the tricky bit).

So instead of:

sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc


I would suggest:

sop sign --as=text --sign-with=alice.sec < announcement.txt

announcement.txt.asc


For example. The idea would be to use `--PURPOSE-with` pattern (where
PURPOSE is `sign`, `verify`, etc) that is already used in `encrypt
--sign-with`. Maybe it could be shortened to just --with in some cases
(like decrypt, sign and verify). The above example would become:

sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
sop extract-cert < alice.sec > alice.pgp

sop sign --as=text --sign-with=alice.sec < announcement.txt > 
announcement.txt.asc
sop verify announcement.txt.asc --verify-with=alice.pgp < announcement.txt

sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
sop decrypt --decrypt-with=alice.sec < ciphertext.asc > cleartext.out

Or, as a diff:

@@ -99,11 +103,11 @@ The key and certificate files described in them (e.g. 
`alice.sec`) could be for
 sop generate-key "Alice Lovelace <alice@openpgp.example>" > alice.sec
 sop extract-cert < alice.sec > alice.pgp
 
-sop sign --as=text alice.sec < announcement.txt > announcement.txt.asc
-sop verify announcement.txt.asc alice.pgp < announcement.txt
+sop sign --as=text --sign-with=alice.sec < announcement.txt > 
announcement.txt.asc
+sop verify announcement.txt.asc --verify-with=alice.pgp < announcement.txt
 
 sop encrypt --sign-with=alice.sec --as=mime bob.pgp < msg.eml > encrypted.asc
-sop decrypt alice.sec < ciphertext.asc > cleartext.out
+sop decrypt --decrypt-with=alice.sec < ciphertext.asc > cleartext.out
 ~~~
 
 Subcommands

Subcommands
===========

`sop` uses a subcommand interface, similar to those popularized by systems 
like `git` and `svn`.

If the user supplies a subcommand that `sop` does not implement, it
fails with a return code of 69.  If a `sop` implementation does not
handle a supplied option for a given subcommand, it fails with a
return code of 37.

 
In general, I feel using the numeric error codes in the document make it
(needlessly?) harder to read. When i got to this section, my first
reaction was: "69?? why 69? and why 37? where the heck do those come
from and why do they matter?" We should at least include a reference to
the "Failure modes section" in the Introduction section. In Terminology
maybe? And maybe refer to it here.

In general, I'm worried there might be inconsistencies between the table
in the "Failure modes" section and the various hardcoded integers
peppered through the document. This practice also makes the document
more difficult to review and maintain in the future. We might instead
use constant names like `SUCCESS`, `NO_GOOD_SIG` that *then* have
integer values in the later section. This could also provide for good
constants to use in a library implementation.

For all commands that have an `--armor|--no-armor` option, it defaults
to `--armor`, meaning that any output OpenPGP material should be
ASCII-armored (section 6 of {{I-D.ietf-openpgp-rfc4880bis}})
by default.

 
Is this on input or output? or both? It's clarified later, I
think, but it should be made explicit here as well.

[...]

generate-key: Generate a Secret Key {#generate-key}
-----------------------------------

    sop generate-key [--armor|--no-armor] [--] [USERID…]

 - Standard Input: ignored
 - Standard Output: `KEY` ({{key}})

Generate a single default OpenPGP certificate with zero or more User
IDs.


[...]

How do we generate purpose-specific subkeys?

[...]

sign: Create a Detached Signature {#sign}
---------------------------------

    sop sign [--armor|--no-armor]
         [--as={binary|text}] [--] KEY [KEY...]

 - Standard Input: `DATA` ({{data}})
 - Standard Output: `SIGNATURE` ({{signature}})

`--as` defaults to `binary`.  If `--as=text` and the input `DATA` is
not valid `UTF-8`, `sop sign` fails with a return code of 53.

 
Why do we mandate UTF-8 here? Explain.

In general, I find the `--as` arguments to be a little confusing and I
don't undrestand what they bring to the table.

Example:

    $ sop sign --as=text alice.sec < message.txt >message.txt.asc
    $ head -n1 < message.txt.asc
    -----BEGIN PGP SIGNATURE-----
    $


Another good example of the "argument vs option" problem. If I would see
a `sop sign` command, the first thing I would try would be:

    sop sign document

and expect it to find the right private key to sign the document
with. Of course, we don't do this in sop, which is fine, but I'll note
that we allow implementations to do so. By forcing the arguments here to
be the signing key, we make it difficult to let the implementation pick
the right key.

We should follow POLA (Principle Of Least Astonishment) here and allow
users to provide the document as an argument, and use something like
`--sign-with=KEY` (possibly multiple times) to provide the private key
material.

[...]

encrypt: Encrypt a Message {#encrypt}
--------------------------

    sop encrypt [--as={binary|text|mime}]
        [--armor|--no-armor]
        [--with-password=PASSWORD...]
        [--sign-with=KEY...]
        [--] [CERTS...]


[...]

`--as` defaults to `binary`.


[...]

If `--as` is set to either `text` or `mime`, then `--sign-with`
will sign as a canonical text document.  In this case, if the input
`DATA` is not valid `UTF-8`, `sop encrypt` fails with a return code of
53.

 
What is `mime` here? Why is it necessary? Expand.

[...]

Example:

(In this example, `bob.bin` is a file containing Bob's binary-formatted 
OpenPGP certificate.
Alice is encrypting a message to both herself and Bob.)

    $ sop encrypt --as=mime --sign-with=alice.key alice.asc bob.bin < 
message.eml >encrypted.asc
    $ head -n1 encrypted.asc
    -----BEGIN PGP MESSAGE-----
    $


This is another good example of how the arguments and options can become
confusing. Looking at the above commandline, I can't tell what alice.asc
or bob.bin are. Is bob.bin a private key? Maybe not, because it's a
`.bin`. But what if it was also named `.asc`? What if I reverse the two
arguments by mistake? I could encrypt to bob instead of alice! Or wait,
those are *both* keys I encrypt to! Phew, I'm safe... But wait, what if
I forgot the `<`!!

See where I'm going? :)

Having an explicit --encrypt-with=alice.asc (or --encrypt-to?) would be
much better here. It would also make much more sense in an API.

decrypt: Decrypt a Message {#decrypt}
--------------------------

    sop decrypt [--session-key-out=SESSIONKEY]
        [--with-session-key=SESSIONKEY...]
        [--with-password=PASSWORD...]
        [--verify-out=VERIFICATIONS
         [--verify-with=CERTS...]
         [--verify-not-before=DATE]
         [--verify-not-after=DATE] ]
        [--] [KEY...]

 - Standard Input: `CIPHERTEXT` ({{ciphertext}})
 - Standard Output: `DATA` ({{data}})

`--session-key-out` can be used to learn the session key on
successful decryption.


"learn"? What does that mean? It seems it means "write to a file". 
If so that should be said explicitely here.

If `sop decrypt` fails for any reason and the identified `--session-key-out`
file already exists in the filesystem, the file will be unlinked.

 
This seems dangerous! Why do we delete a file we haven't created?
Explain.

`--session-key-in` enables decryption of the `CIPHERTEXT` using the session 
key directly against the `SEIPD` packet.
This option can be used multiple times if several possible session keys 
should be tried.


What happens if both "in" and "out" are provided? I can venture a guess,
but it would be important to make that explicit as there can be horrible
bugs there.

`--with-password` enables decryption based on any `SKESK` packets in the 
`CIPHERTEXT`.
This option can be used multiple times if the user wants to try more than one 
password.

 
We should include SKESK in terminology, because it's the first time we
encounter it here and I have close to no idea what it means.

If `sop decrypt` tries and fails to use a supplied `PASSWORD`, and it
observes that there is trailing `UTF-8` whitespace at the end of the
`PASSWORD`, it will retry with the trailing whitespace stripped.

 
Explain why we do magic things with whitespace. Consider not doing magic
at all as magic can be evil.

`--verify-out` produces signature verification status to the
designated file.

`sop decrypt` does not fail (that is, the return code is not modified)
based on the results of signature verification.  The caller MUST check
the returned `VERIFICATIONS` to confirm signature status.  An empty
`VERIFICATIONS` output indicates that no valid signatures were found.
If `sop decrypt` itself fails for any reason, and the identified
`VERIFICATIONS` file already exists in the filesystem, the file will
be unlinked.

`--verify-with` identifies a set of certificates whose signatures would be
acceptable for signatures over this message.


Not failing explicitely on verification seems very dangerous. It relies
on callers properly reading the spec and realizing this is the only
exception where exit codes don't suffice in providing a general state of
the program. I would strongly recommend failing here, just like regular
verify.

As an aside, why can't we compose verify and decrypt here and just keep
"verify" out of "decrypt" altogether? I would guess that's (a
limitation?) part of the OpenPGP standard, but maybe it would be nice to
explicitely expand on this here as well.

If the caller is interested in signature verification, both
`--verify-out` and at least one `--verify-with` must be supplied.  If
only one of these arguments is supplied, `sop decrypt` fails with a
return code of 23.

 
Another argument for failing on bad signatures: if we fail on bad
arguments of --verify, why don't we fail on bad signatures?

armor: armor: Add ASCII Armor
-----------------------------


[...]

If the incoming data is already armored, and the `--allow-nested` flag
is not specified, the data MUST be output with no modifications.
Data is considered ASCII armored iff the first 14 bytes are exactly
`-----BEGIN PGP`. This operation is thus idempotent by default.

 
Explain why we want idempotent and why we want to do this guessing game.

[...]

Input/Output Indirect Types
===========================

Some material is passed to `sop` indirectly, typically by referring to
a filename containing the data in question.  This type of data may
also be passed to `sop` on Standard Input, or delivered by `sop` to
Standard Output.

If the filename for any indirect material used as input has the
special form `@ENV:xxx`, then contents of environment variable `$xxx`
is used instead of looking in the filesystem.

If the filename for any indirect material used as either input or
output has the special form `@FD:nnn` where `nnn` is a decimal
integer, then the associated data is read from file descriptor `nnn`.


This @ENV: and @FD: stuff really makes me uncomfortable. It's a neat
hack for commandline applications, but it would break down when
designing a library API, as the "type" of data passed around the API
would be ambiguous, or at least with possible side effects. That feels
like "design smell" here and I would like this to be changed.

I would recommend using equivalent environment variables to the
parameters instead, for example SIGN_WITH for --sign-with and so
on. This would, of course, require switching positional arguments to
options but I already explain why that would be a good idea anyways
earlier.

File descriptors could be passable as distinct options, like
--sign-with-fd for --sign-with.

I've dealt with commandline applications that have special meanings with
@, and in retrospect, it was a bad idea. In particular, Python's
argparse module supports using a prefix argument to mean "read options
from this file" and I've used it to implement crude configuration file
support for monkeysign and other programs. It's confusing for users and
does not work very well.

Specifically in this case, I would also worry about security
vulnerabilities with untrusted filenames being passed to the program.

[...]

CERTS {#certs}
-----

One or more OpenPGP certificates (section 11.1 of 
{{I-D.ietf-openpgp-rfc4880bis}}), aka "Transferable Public Key".
May be armored.

Although some existing workflows may prefer to use one `CERTS` object with 
multiple certificates in it (a "keyring"), supplying exactly one certificate 
per `CERTS` input will make error reporting clearer and easier.


This last bit is in contradiction with `extract-cert` command
documentation which says it will "only contain one cert". Maybe we
should just pick one and stick with it here?
 

[...]

VERIFICATIONS {#verifications}
-------------

One line per successful signature verification.  Each line has three
structured fields delimited by a single space, followed by arbitrary
text to the end of the line.

 - ISO-8601 UTC datestamp
 - Fingerprint of the signing key (may be a subkey)
 - Fingerprint of primary key of signing certificate (if signed by primary 
key, same as the previous field)
 - arbitrary text

 
That last part doesn't *look* like "arbitrary text" to me. It looks like
some explanatory message of the operation. If that's the case, we should
make that explicit and say why the text is present at all. Calling it a
"note" or "message" would already be an improvement.

Example:

    2019-10-24T23:48:29Z C90E6D36200A1B922A1509E77618196529AE5FF8 
C4BC2DDB38CCE96485EBE9C2F20691179038E5C6 certificate from dkg.asc


For the record, arbitrary text looks like:

This is some garbage I just jklfa bldasjkl ajklblablabla.


I can provide more samples or arbitrary text as required. I'm an
experience freelance writer and large samples can be found on
https://anarc.at/ ;)

[...]

Failure Modes
=============

When `sop` succeeds, it will return 0 and emit nothing to Standard
Error.  When `sop` fails, it fails with a non-zero return code, and
emits one or more warning messages on Standard Error.  Known return
codes include:

Return | Meaning
---:|--------------------------------------------------
 0 | Success
 3 | No acceptable signatures found (`sop verify`)
13 | Asymmetric algorithm unsupported (`sop encrypt`)
17 | Certificate not encryption-capable (e.g., expired, revoked, unacceptable 
usage flags) (`sop encrypt`)
19 | Missing required argument
23 | Incomplete verification instructions (`sop decrypt`)
29 | Unable to decrypt (`sop decrypt`)
31 | Non-`UTF-8` password (`sop encrypt`)
37 | Unsupported option
41 | Invalid data type (no secret key where `KEY` expected, etc)
53 | Non-text input where text expected
69 | Unsupported subcommand


I already mentioned some problems with those failure codes, but I will
repeat here the suggestion that symbolic names instead of integers
should be used for primary referencing in the document here again.

A `sop` implementation MAY return other error codes than those listed
above.

 
This sounds like a bad idea. I interpret that as meaning that I can
return an error code 2 instead of error code 3 if i fancy. If we're
going to pick numbers, we should either enforce them or not, but don't
dance around the issue and encourage people to diverge from the spec.

Or at least, if you allow divergence, explain why it can be allowed.
 
It would also be great if we could explain where those magic numbers
come from in the first place. I suspect they were chosen to not overlap
with existing error codes, but that's just a guess.

[...]

Detached Signatures {#detached-signatures}
-------------------

`sop` deals with detached signatures as the baseline form of OpenPGP 
signatures.

The main problem this avoids is the trickiness of handling a signature that 
is mixed inline into the data that it is signing.


Should we expand on "trickiness" here?
 
Also: how *do* we deal with inline signatures? Are those deprecated now?

[...]

Guidance for Consumers
======================

While `sop` is originally conceived of as an interface for interoperability 
testing, it's conceivable that an application that uses OpenPGP for object 
security would want to use it.

FIXME: more guidance for how to use such a tool safely and efficiently goes 
here.

FIXME: if an encrypted OpenPGP message arrives without metadata, it is 
difficult to know which signers to consider when decrypting.
How do we do this efficiently without invoking `sop decrypt` twice, once 
without `--verify-*` and again with the expected identity material?

 
Maybe we could use a "sop probe" command for this and other things?

Security Considerations
=======================

The OpenPGP object security model is typically used for confidentiality and 
authenticity purposes.

Signature Verification {#signature-verification}
----------------------

In many contexts, an OpenPGP signature is verified to prove the origin and 
integrity of an underlying object.

When `sop` checks a signature (e.g. via `sop verify` or `sop decrypt 
--verify-with`), it MUST NOT consider it to be verified unless all of these 
conditions are met:

 * The signature must be made by a signing-capable public key that is present 
in one of the supplied certificates
 * The certificate and signing subkey must have been created before or at the 
signature time
 * The certificate and signing subkey must not have been expired at the 
signature time
 * The certificate and signing subkey must not be revoked with a "hard" 
revocation
 * If the certificate or signing subkey is revoked with a "soft" revocation, 
then the signature time must predate the revocation
 * The signing subkey must be properly bound to the primary key, and 
cross-signed
 * The signature (and any dependent signature, such as the cross-sig or 
subkey binding signatures) must be made with strong cryptographic algorithms 
(e.g., not `MD5` or a 1024-bit `RSA` key)

 
Latter seems to contradict section 7.5, which says:

   For performance reasons, an implementation may choose to ignore
   validation on certificate and key material supplied to it.  The
   security implications are of doing so depend on how the certs and
   keys are managed outside of "sop".


So should we check supplied keys or not? but I guess that's covered by...

[...]

Signature validity is a complex topic, and this documentation cannot list all 
possible details.


Is there a reference we could add here to cover that topic?

Compression {#compression}
-----------

The interface as currently specified does not allow for control of 
compression.
Compressing and encrypting data that may contain both attacker-supplied 
material and sensitive material could leak information about the sensitive 
material (see the CRIME attack).

Unless an application knows for sure that no attacker-supplied material is 
present in the input, it should not compress during encryption.


How about decryption? Do we attempt decompression during decrypt?

That's about it! My comments might be a little dry and are the results
of notes I jotted down on "paper" (a PDF really), a copy of which is
available here:

https://paste.anarc.at/publish/2019-11-10-sxrDKhJL9R0/sop-Exported.pdf

I hope the comments are still useful and please don't interpret them as
me being upset or too critical. I love this work and it's only in a
spirit of collaboration that I bring up those concerns. :)

Thank you for your work!

A.

PS: I mistakenly took the modified version of the document, as available
on the HEAD of the merge request, to comment on the document. I have
tried, as much as possible, to revert changes to the original in the
examples, but some other changes I suggested might have crept up in the
quoted text. That shouldn't affect the comments I have brought up in any
other way.

-- 
Antoine Beaupré
torproject.org system administration

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp