Re: email hashes in PGP keys as protection against spam


Am Montag 05 Oktober 2009 schrieb Daniel Kahn Gillmor:

Hello,

Interesting proposal (about digesting User IDs), but i suspect that the
ietf's openpgp working group is a better place to discuss this kind of
change than the tool-specific gpg-devel list.


of course, this in a general approach, not a tool specific one. I didn't 
know that this working group exists. On the other hand the technical part 
seems quite easy to me so it might be helpful if somebody just did it for 
one tool (not officially, rather for testing purposes) so that one could 
play around with it and discuss the general implementation based on some 
experience

For that reason, i'm sending my reply there, and i've set Reply-To there
as well.  i hope that's OK with you.


Sure, I have subscribed to that list now.

some questions your proposal raises for me:

 0) you only talk about digesting the e-mail part of the address.  what
about the human-specific name?  Would this need to be digested also?
Why or Why not?


From a technical point of view that nearly does not matter so one could 
leave this up to each user.

I think that the spam protection effect of hiding names is quite limited. 
You prevent trying firstname(_dot_)lastname(_at_)popularmailservice(_dot_)com 
only. Would 
any spammer do that? Names can be fetched from other sources though less 
clear (from a parser's perspective).

But there is a second argument, privacy. This is valid for the email 
hashing, too. In an ideal world the key server data could not be used for 
anything their "owner" does not want it to be used for.

Thus I suggest to let the user decide whether he wants his name hashed or 
not. The problem with names is that their notation is less clear than that 
of email addresses, at least for names which contain "strange" characters 
and people with several first names (mentioned at all, fully written or 
just the initial?).

When talking about privacy the same question moves to the comment field. 
Hashing it does not make much sense to me as this information cannot 
easily obtained from another source. So I would leave it empty or in 
cleartext.

The PGP usage may change to fetching the raw key from a server and 
receiving the full key from its owner with the first reply.

 1) your proposal lacks a concrete example case; What would the User ID
for 'Jane Doe <jane(_at_)example(_dot_)org>' look like under this policy?  The
devil is often in the details, and an explicit example would help sort
out the details.


"Jane Doe" -> cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62
jane(_at_)example(_dot_)org -> 77baeb8633437c80bc3f06a7bcfbad66185ca14b

I see three possible variants:

- the email hash only
<sha1:77baeb8633437c80bc3f06a7bcfbad66185ca14b>

- cleartext name and email hash
'Jane Doe <sha1:77baeb8633437c80bc3f06a7bcfbad66185ca14b>'

- name and email hashed
'sha1:cac7bbb6b67b44ea0ab997d34a88e4ea9b4d3d62 <sha1:77baeb8633437c80bc3f06a7bcfbad66185ca14b>'

I just notice a problem: You obviously have to know the hash function which 
the key owner has used. So if you want to allow several functions you have 
to accept the additional traffic at the key servers by trying all 
functions.

 2) Would the act of keysigning need to change under your proposal?  If
so, what would keysigners need to do differently than they currently do?


This act would not have to change at all. Everything could be done within 
the gpg software. I would mainly change the key listing format as it does 
not make sense to show all UIDs twice (thus suppredd hash UIDs by 
default).

But it makes sense to ask for the hashed string. As my description points 
out you will hardly ever meet a raw hash key, one for which you don't have 
the cleartext UID (or at least email). So if the pgp software is to sign a 
hashed key it should check that the hash matches the additional (outside 
the key) information. But this ist just an organizational help to guard 
against malicious keys, not a technical requirement.

I do not know the openpgp key format. Would it be easily possible to add 
the signed information whether the UID of this key may or must not be 
uploaded to a key server in cleartext, at best distinguishing between name 
and email?


One small additional point: This hashing approach would be used for all 
published keys, not only for key servers. I guess that most PGP users have 
their public key on their web site.


Hauke