Re: [openpgp] Possible ambiguity in description of regular expressions:

On 05/01/2021 15:33, Daniel Kahn Gillmor wrote:

But beyond the wordsmithing, if anyone thinks that Neal's interpretation
(or my proposed clarification) is actually wrong or problematic, please
speak up!

The original proposed change: "A range is a non-empty sequence ofcharacters enclosed in []" is clear and (mostly) effective, so IMOshould be adopted despite the fact that it is insufficient in itself.


Now, consider the remaining corner cases:

[]]     : matches a closing square bracket
[^]]    : matches anything other than a closing square bracket
[[]     : matches an opening square bracket
[^[]    : matches anything other than an opening square bracket
[][]    : matches either square bracket
[^][]   : matches anything other than either square bracket
[^]     : is an incorrectly nested sequence

To tackle the insufficiency, I propose an additional change:

-If the sequence begins with '^', it matches any single character notfrom the rest of the sequence.+If the sequence begins with '^', it matches any single character notfrom the rest of the sequence, which must then contain at least onefurther character following the '^'.

When read alongside "To include a literal ']' in the sequence, make itthe first character (following a possible '^')", this should besufficient to cover all corner cases.

(I considered "... the rest of the sequence, which must be non-empty",but it is unclear whether "which" refers to "the sequence" or "the restof the sequence")


However...

We should probably also explicitly note how to negate the specialmeaning of '^':

+To include a literal '^', locate it somewhere other than the firstcharacter of the sequence.

Now, this doesn't cover the (contrived) case where we may want to use aliteral '^' as the beginning of an ASCII range:


[^-~]

But if absolutely necessary, one could refactor:

[_-~^]

While we're at it, we should also clarify range inclusivity:

-this is shorthand for the full list of ASCII characters between them

+this is shorthand for them and the full list of ASCII charactersbetween them

Also, many regex engines support backslash-escaping within a characterclass. Does RFC4880 support this or not? My reading is that it doesn't,but it may be worth explicitly clarifying this also (even thoughbackslash escaping would be a more elegant solution to [^-~]).

Is there anything to be said for referring out to an external regexdefinition instead of reinventing the wheel? :-)


--
Andrew Gallagher

OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp

Re: [openpgp] Possible ambiguity in description of regular expressions: [^][]