ietf-openpgp
[Top] [All Lists]

Re: [openpgp] User ID conventions (it's not really a RFC2822 name-addr)

2019-11-06 11:52:56
Hi Neal--

Thanks for this thoughtful writeup!

On Tue 2019-11-05 23:35:11 +0100, Neal H. Walfield wrote:

Beyond being more fleshed out, this grammar is different from the
grammar in dkg's second proposal in a few ways.

First, it matches comments.  dkg made this a non-goal.  Given that
people who add comments intend them as comments and not as part of
their name, it seems reasonable to me to not display comments in
places where only the user's name is desired.  And, since it turns out
that matching non-nested comments is relatively straightforward, why
not?  Note: doing this might actually help deprecate comments, because
they won't be shown as often.

User IDs are full UTF-8 strings.  The idea that any part of that string
would be hidden from the user is pretty disturbing to me.  Consider the
situation where someone is certifying a user ID on an OpenPGP
certificate.  If the comment is hidden, do they know what identity
assertion they're making?

I'd much rather have comments be deprecated *because they are weird and
show up in places where you'd think your name should go* rather than
have them be some vestigial thing that people don't even notice any
longer.

I would recommend dropping the comment from your grammar and letting the
"name" part subsume it, when you're splitting out e-mail address from
the rest of the user ID.

Furthermore, because you've allowed "(" and ")" in atext-specials, it
looks to me like your proposed grammar is ambiguous:

    bob (joe) <bob(_at_)example(_dot_)net>

is either:

    name: "bob (joe)"
    comment: None
    addr-spec: "bob(_at_)example(_dot_)net"

or:

    name: "bob"
    comment: "joe"
    addr-spec: "bob(_at_)example(_dot_)net"

I don't think this is helpful to anyone.

The grammar more carefully handles whitespace.  It ignores whitespace
at the beginning of the User ID (this is what motivates the
name-char-start production) and between the individual components in
the pgp-uid-convention production.  As is, the grammar only ignores
the 0x20 space character.  We may also want to include the tab
character, unicode's NO-BREAK SPACE (U+00A0) character and its
IDEOGRAPHIC SPACE (U+3000) character for thoroughness.  But, since
software will normally concatenate the individual components, just
recognizing the ASCII space character here is probably fine.  Whatever
the case, I think we can safely ignore the rest of unicode's
whitespace characters:

  https://en.wikipedia.org/wiki/Whitespace_character

I'm fine with being judicious about selecting whitespace characters.  In
addition to tab (U+0009, ascii "HT"), i note that you've declined to
include U+000A and U+000D (ascii "LF" and "CR") in the grammar at all.

I like that kind of opinionated decision, as unprintable symbols like
this are likely to be problematic in many ways (hard for users to
distinguish at least!)

I also think that whitespace at the beginning of a user ID is asking for
trouble, and would be happy with a grammar that considers that user ID
non-conventional.  Is there a use case for leading whitespace in a user
ID?

My pgp-uid-convention production also matches user ids without email
addresses, e.g., "Daniel Kahn Gillmor".  This is convenient.  Instead
of having to figure out why parsing failed (is it not valid UTF-8? is
it just missing an addr-spec?), we explicitly cover this common
pattern in the grammar.  I think this will significantly simplify code
that uses this interface: if there is an error, then the code can just
assume the User ID is trash and can be ignored.

I should be clear that i intended my earlier proposal specifically to
match OpenPGP User ID conventions *that have an e-mail address in
them*.  There are indeed other User ID conventions (like "Daniel Kahn
Gillmor", or "ssh://foo.example") that aren't covered by this, and i
thought i would be doing folks a favor by focusing on the e-mail address
side of things specifically.  My thought was that common interfaces
would allow for matching against a User ID that has an e-mail address,
and then they would have other matchers for other common conventions
that they could try applying if this convention didn't match.

This is probably an implementation detail, though.

In RFC 2822, "specials" are only allowed in a display name if they are
quoted.  dkg removes this requirements.  I think this is mostly
sensible, but it means that we can have User IDs like:
"<foo(_at_)example(_dot_)org> <foo(_at_)example(_dot_)org>" where the first
<foo(_at_)example(_dot_)org> is the display name and the second is the 
addr-spec.
I think we should exclude angle brackets from the display name.  In my
grammar, I have an "atext-specials" which is just RFC 2822 specials
without the angle brackets.

I totally agree with this constraint.  If you're doing away with
comments (as i recommend above) then you would have to prohibit angle
brackets in commas too, which seems fine to me.

Even if you decide to go ahead with splitting out comments, I would go
so far as to ban them in comments too.  is there any plausible reason
for including angle brackets in a comment?  Simplify simplify :)

I'm a bit concerned about allowing the backslash character: with this
grammar, it is just a normal character, but for an RFC 2822 parser,
it's an escape character.  Since User IDs may be used in contexts
where RFC 2822 things are expected, we should be careful.  But, I fear
that if we reject it, we'll end up gratuitiously rejecting some
emojis.  ¯\_(ツ)_/¯.

There are all kinds of things that will break if implementations
casually stick OpenPGP user IDs into an e-mail header, not just
backslashes.  for example, commas are likely to cause a problem.
consider trying to mail two people whose OpenPGP certificates have these
User IDs:

    Lucy Hernandez, MD <lucy(_at_)example(_dot_)com>
    Chuck Wilson, Jr. <chuck(_at_)example(_dot_)net>

A simple concatenation with commas yields the disastrous:

To: Lucy Hernandez, MD <lucy(_at_)example(_dot_)com>, Chuck Wilson, Jr. 
<chuck(_at_)example(_dot_)(_dot_)net>

and DQUOTE is just as bad if not worse :)

So i have no problem with including backslash in the display name area.

    --dkg

Attachment: signature.asc
Description: PGP signature

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp