Re: [openpgp] Fingerprints

On Fri, Apr 17, 2015 at 10:24 AM, ianG <iang(_at_)iang(_dot_)org> wrote:

On 16/04/2015 18:46 pm, Phillip Hallam-Baker wrote:

<Fingerprint-ID>

At the moment the consensus proposal seems to be that Fingerprint-ID
is a numeric code that has exactly two entries.


I don't know why we'd do both.  I suppose it's because hashes are like
mountains and seeing them, we have to walk up them.  If there are two, we
have to walk up and down twice...


It is very much the settled consensus now that proliferating crypto
algorithms is a bad thing. The security of a system is determined by
its weakest algorithm, not the strongest. So we need the number of
algorithms to be as small as possible.

The number of algorithms has to be greater than zero.

Systems that have a single algorithm that cannot be changed have also
resulted in real problems. People tend to hard code code paths in the
expectation that they will never change.

So for all those reasons, the approach that seems to work best as the
general rule is exactly one mandatory to implement and exactly one
recommended alternative as a backup.

Given that we only have two hash algorithms that are likely
candidates, the only real discussion is which of the two should be
mandatory to implement for use in OpenPGP. Which is a discussion we
can and should defer to later. Right now I can make a fairly good
argument that SHA-2 is the better pick based on the facts as they
stand today. I do not expect those facts to be the same in 18 months
time.

But right now I don't have SHA-3 implementations on my platform and
nor do many others. And I really don't want to spend time writing a
library or using code from other sources when I know that I am certain
to have a BSD/MIT licensed alternative from a well resourced vendor in
a short space of time.

I suggest:

96: SHA-2-512
144: SHA-3-512


In the unfortunate event that we allocate multiple hashes + numbers, then I
suggest we also allocate an X that is to be used for closed, internal
trials.  This way, people are less likely to homestead spots and then come
to us with arguments about how they're using ABC and they don't want people
to change and bla bla.


I don't think this is a problem. I very much doubt anyone will want to
argue for an algorithm that is not SHA-2 or SHA-3 and one of the main
reasons to choose the 512 bit versions and truncate is that it removes
the incentive to argue for one of the shorter versions.

The reason to pre-allocate the two spots now is precisely to remove
the incentive for homesteading. Right now I don't have SHA-3 but I
need something to test. We are definitely going to see homesteading if
we don't declare a spot for SHA-2.

These numbers are not completely random. While the codes themselves
don't matter, using 0x60 and 0x90 has the pleasing and convenient
effect that SHA-2-512 fingerprints will always start with the letter M
(for Merkle-Damgard) and SHA-3-512 fingerprints will always start with
the letter S (for Spongeworthy).


OK, cautious nod to the letters - although it would be pleasing if you could
point to a web calculator that could lay out the conversions for those of us
who've forgotten how do to hex-b32-ascii-dec in our head ;)


OK, using the Windows calculator in programmer mode:

96 decimal = 0x60 = 0110,0000
144 Decimal = 0x90 = 1001,0000

Base32 requires us to start with the most significant bits. So the
first characters are

01100 = 12 = 'M'
10010 = 18 = 'S'

The base32 encoding table is taken from: https://tools.ietf.org/html/rfc3548

This is not just the IETF encoding, it is I believe the same one that Phil Z.
proposed. And likely for the same reason - take the latin-1 alphabet first, then
discard the numbers 0, 1, 2 which are easily confused with O, I and Z.

I verified the analysis using these tools:
http://www.binaryhexconverter.com/decimal-to-binary-converter
http://tomeko.net/online_tools/hex_to_base32.php?lang=en


Note that I am not certain yet that we want to use Base32 encoding without
modification. We might well want to use a parity scheme so that the
fingerprint can be verified as it is typed.

Lets say we are using a fingerprint with six character blocks:

aaaaa-bbbbb-ccccc-ddddd-eeeee-fffff

We could make the least significant bit of block A a parity check
on the other bits of block A, the least significant of block B a parity check
on AB, the lsb of C a check on ABC, etc.

This adds quite a bit of robustness and allows the value to be checked
as it is typed. It is not a perfect check of course. But it is something. And
it might just be that six character blocks with parity checking has higher
user acceptability than five blocks without. So this might well strengthen
the system net.

What is the extension strategy for when we've exhausted the 256
possibilities in a byte?

(Yes I realise you didn't specify a byte, but I guess that's part of the
question.)


The general answer is 'reserve half the code points in the initial registry
to extension schemes'. So I see two options:

1) Prepend the identifier to the hash value, this obviously requires the
identifiers be issued in byte aligned increments.

Fingerprint = ID + Hash

If Fingerprint [0] < 128, the first byte is the algorithm identifier
If Fingerprint [1] < 196, the lower 14 bits of the first two bytes
   are the algorithm identifier.

This gives 128 possible single byte identifiers and 16384 two byte
values. I do not feel the need to specify additional expansion capability
since we never came close to exhausting a 16 bit algorithm registry for
a single algorithm when we encouraged multiple algorithms (suites
are a different issue)

2) Make the topmost 5 bits the identifier.

This is actually simpler to implement on many platforms as it is easier
to overwrite the first byte of a buffer than prepending another buffer
This discards data from the hash value of course but that is inevitable
when a fingerprint is used.

We ourselves don't want more than a handful.  But if we open up the
fingerprint standard to a wider audience, then austerity will be out the
window.  What's our approach of the TLS group decides they want to add a few
hundred?


The TLS group have run into problems because suites are a bad idea.

Choosing the strongest versions of the best of class Merkle-Damgard and
Spongeworthy algorithms should remove the incentive to proliferate.
I have never been in a situation where someone has been saying 'we
need a weaker algorithms'.

The COAP folk might want to use a rubbish algorithm 'for speed' of
course. But the only algorithm they are likely to agree on is SHA1
and that isn't a lot faster and there are all sorts of reasons why they
are going to be unable to make it their only algorithm.

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp