Hi OpenPGP folks--
I spent a good part of today trying to write a patch for including
Simple Octet Strings in the crypto-refresh draft, pursuant to gniibe's
presentation at IETF 110 and his public memo at:
https://www.gniibe.org/memo/standard/openpgp/ecc-in-openpgp-sos.html
This message doesn't include the patch i wanted to make because I think
i ran into a problem with it. I'm trying to explain the problem here,
and how my thinking has shifted in the course of trying to write the
thing down. Please correct me where I've erred.
IIUC, SOS is basically intended as a genericization of MPI; it
represents its length in a two-vector scalar as (8*octet-count) instead
of as (bit-count). It "fits" where MPI fits, but doesn't require
leading-zero removal, or require interpretation as an integer at all.
The problem comes with the different length representations, in
particular for existing ECC mechanisms.
Because existing ECC points are currently represented with a leading
prefix byte (0x04 for NIST curves, 0x40 for 25519), and that byte has
the leading bit cleared, the MPI version of the length scalar is
*different* on the wire from the SOS version of the length scalar.
As a concrete example, a NIST p256 pubkey looks today like this (using
visualization from "sq packet dump --hex"):
```
Public-Key Packet, old CTB, 2 header bytes + 82 bytes
Version: 4
Creation time: 2021-03-24 22:06:14 UTC
Pk algo: ECDSA public key algorithm
Pk size: 256 bits
Fingerprint: D4732B93D4E82D068EE197A63A73ED20925926A6
KeyID: 3A73ED20925926A6
00000000 98 CTB
00000001 52 length
00000002 04 version
00000003 60 5b b7 d6 creation_time
00000007 13 pk_algo
00000008 08 curve_len
00000009 2a 86 48 ce 3d 03 01 curve
00000010 07
00000011 02 03 ecdsa_public_len
00000013 04 29 75 bb 53 55 49 31 80 ab 03 54 15 ecdsa_public
00000020 a0 6c 29 10 a8 85 b7 2f e5 93 81 90 48 08 b8 0e
00000030 c7 02 55 aa 51 b2 b7 6a 00 7a a1 2b cf 25 a7 e7
00000040 9c 95 b1 fa 79 87 c2 cb 94 53 a8 da c2 00 58 45
00000050 79 a0 96 8e
```
but if it was represented in SOS form, we'd see this change:
```text/x-diff
00000010 07
- 00000011 02 03
ecdsa_public_len
+ 00000011 02 08
ecdsa_public_len
00000013 04 29 75 bb 53 55 49 31 80 ab 03 54 15 ecdsa_public
```
The result is that the fingerprint would also change (to
C9EAB3BE6B20CAEA568BE93F9BA3D29CBB873842), since the bytestream that
feeds into the fingerprint calculation changes. This means that we can
represent a single EC public key (with the exact same creation date) in
at least two different ways just by varying whether the public point is
seen as an SOS or an MPI.
(note: this concern is not necessarily just for ECC; it's also relevant
for traditional RSA, DSA, and ElGamal key MPI components that might have
the highest set bit somewhere other than just below an octet boundary.
For example, if we were to "retcon" RSA's e component as an SOS instead
of an MPI, the most common value for e (65537) could change on the wire
from [00 11 01 00 01] to [00 18 11 00 01], resulting in a similar change
to the fingerprint)
This leaves me wondering about the utility of the SOS abstraction. I
can imagine a few distinct options:
a) Introduce SOS and apply it to existing ECC pubkeys. This is what I
understand to be gniibe's proposal. It looks like this means an
implementation will have to cope with multiple distinct wire
representations and fingerprints of the exact same public key.
b) Introduce the SOS abstraction, but only use it for *new* pubkey
algorithms. All existing ECC (including NIST and 25519) would still
be this weird pseudo-MPI, representing the count based on the
highest set bit, not 8*octet-count, and requiring clearing of
leading zeros.
c) Introduce a variant of SOS that indicates count based on highest set
bit (not 8*octet-count) and use that for everything, but without
requiring removal of leading all-zero octets.
d) Not introduce the SOS abstraction at all, but instead explicitly say
that new algorithms can define their own fixed algorithm-specific
bytestream for public key material, secret key material, and
signatures -- it does not need to be MPIs (gniibe's memo calls this
approach "Each data format definition by each curve").
e) No SOS abstraction, and just keep on doing the kind of pseudo-MPI
we've been doing for any new curves (gniibe's memo calls this
"Practically easiest").
f) Introduce a new distinct data type that uses an actual octet count
for each bytestring (gniibe's memo calls this "Non-strange but Just
an Octet String")
Of these choices, I'm leaning toward (d) but i'd love to hear what other
people think.
(a) Seems problematic because of the multiple acceptable representations
outlined above. It's already pretty tricky that the fingerprint of
a piece of given public key material can vary depending on the
creation timestamp. Some implementations might try to normalize an
SOS into a "compliant" MPI, thereby affecting their ability to
calculate the correct fingerprint. I don't think we currently have
any tests that demonstrate interoperability in the face of this kind
of confusion.
(b) Seems sort of pointless because it doesn't let us clean up any of
the existing mess. If we're going to keep all the existing mess, we
might as well go with (d).
(c) Introduces a new problem: When encoding an SOS string, how many
leading zero-octets should be used? If the answer is
"implementation gets to decide" then we share the
multiple-representations problem with (a); And if the answer is
"each algorithm has a mandated size" then we might as well go with
(d). Also, when consuming such a string, an implementation would
need to first scan through all the leading zero-octets before it
could use the length count to jump to the next field. That's a new
piece of ugliness for OpenPGP parsing.
(e) Bothers me because it means that each new algorithm introduced has
to declare not only its data formats, but how to transform those
data formats into the semblance of an OpenPGP "MPI". This just
seems like extra work and new ways for both implementers and
specifiers to get things wrong.
(f) Seems excessive: why declare a standard for encoding lengths of
fields in those cases where the lengths are pre-determined by the
algorithm in question (i expect that to be most cases for new
algorithms). For specific algorithms which have variable length
parameters, *those algorithms* can declare how they want to delimit
the fields. If the goal is to be able to skip over
algorithm-specific details of an algorithm that we don't know, what
are we skipping *to*? For public key packets and signature packets,
the only thing that comes next is the end of the packet, whose size
we already know. For secret key packets, we *might* try to skip
from the public key material to the secret key material, but again,
if we don't know how many MPIs (or SOSes or JOSes) to skip for
public keys we might not even be able to tell when the s2k usage
octet starts, so we can't do that today anyway.
I will probably try to draw up a concrete patch for (d) unless other
folks in the WG think it would be useful for consideration. The patch
will likely ask IANA to add new columns to the "public key algorithm"
registry (e.g. "public key representation", "secret key representation"
and "signature representation") and to formally include an Elliptic
Curves registry with similar columns, so that we have a compact way to
view these distinctions, and a place for new proposals to include
updates.
If any of my analysis above is wrong, please help me understand it
better!
--dkg
signature.asc
Description: PGP signature
_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp