ietf-openpgp
[Top] [All Lists]

Re: [openpgp] Possible ambiguity in description of regular expressions: [^][]

2021-01-05 11:11:57
On 05/01/2021 15:33, Daniel Kahn Gillmor wrote:
But beyond the wordsmithing, if anyone thinks that Neal's interpretation
(or my proposed clarification) is actually wrong or problematic, please
speak up!

The original proposed change: "A range is a non-empty sequence of characters enclosed in []" is clear and (mostly) effective, so IMO should be adopted despite the fact that it is insufficient in itself.

Now, consider the remaining corner cases:

[]]     : matches a closing square bracket
[^]]    : matches anything other than a closing square bracket
[[]     : matches an opening square bracket
[^[]    : matches anything other than an opening square bracket
[][]    : matches either square bracket
[^][]   : matches anything other than either square bracket
[^]     : is an incorrectly nested sequence

To tackle the insufficiency, I propose an additional change:

-If the sequence begins with '^', it matches any single character not from the rest of the sequence. +If the sequence begins with '^', it matches any single character not from the rest of the sequence, which must then contain at least one further character following the '^'.

When read alongside "To include a literal ']' in the sequence, make it the first character (following a possible '^')", this should be sufficient to cover all corner cases.

(I considered "... the rest of the sequence, which must be non-empty", but it is unclear whether "which" refers to "the sequence" or "the rest of the sequence")

However...

We should probably also explicitly note how to negate the special meaning of '^':

+To include a literal '^', locate it somewhere other than the first character of the sequence.

Now, this doesn't cover the (contrived) case where we may want to use a literal '^' as the beginning of an ASCII range:

[^-~]

But if absolutely necessary, one could refactor:

[_-~^]

While we're at it, we should also clarify range inclusivity:

-this is shorthand for the full list of ASCII characters between them
+this is shorthand for them and the full list of ASCII characters between them

Also, many regex engines support backslash-escaping within a character class. Does RFC4880 support this or not? My reading is that it doesn't, but it may be worth explicitly clarifying this also (even though backslash escaping would be a more elegant solution to [^-~]).

Is there anything to be said for referring out to an external regex definition instead of reinventing the wheel? :-)

--
Andrew Gallagher

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp