On 05/01/2021 15:33, Daniel Kahn Gillmor wrote:
But beyond the wordsmithing, if anyone thinks that Neal's interpretation
(or my proposed clarification) is actually wrong or problematic, please
speak up!
The original proposed change: "A range is a non-empty sequence of
characters enclosed in []" is clear and (mostly) effective, so IMO
should be adopted despite the fact that it is insufficient in itself.
Now, consider the remaining corner cases:
[]] : matches a closing square bracket
[^]] : matches anything other than a closing square bracket
[[] : matches an opening square bracket
[^[] : matches anything other than an opening square bracket
[][] : matches either square bracket
[^][] : matches anything other than either square bracket
[^] : is an incorrectly nested sequence
To tackle the insufficiency, I propose an additional change:
-If the sequence begins with '^', it matches any single character not
from the rest of the sequence.
+If the sequence begins with '^', it matches any single character not
from the rest of the sequence, which must then contain at least one
further character following the '^'.
When read alongside "To include a literal ']' in the sequence, make it
the first character (following a possible '^')", this should be
sufficient to cover all corner cases.
(I considered "... the rest of the sequence, which must be non-empty",
but it is unclear whether "which" refers to "the sequence" or "the rest
of the sequence")
However...
We should probably also explicitly note how to negate the special
meaning of '^':
+To include a literal '^', locate it somewhere other than the first
character of the sequence.
Now, this doesn't cover the (contrived) case where we may want to use a
literal '^' as the beginning of an ASCII range:
[^-~]
But if absolutely necessary, one could refactor:
[_-~^]
While we're at it, we should also clarify range inclusivity:
-this is shorthand for the full list of ASCII characters between them
+this is shorthand for them and the full list of ASCII characters
between them
Also, many regex engines support backslash-escaping within a character
class. Does RFC4880 support this or not? My reading is that it doesn't,
but it may be worth explicitly clarifying this also (even though
backslash escaping would be a more elegant solution to [^-~]).
Is there anything to be said for referring out to an external regex
definition instead of reinventing the wheel? :-)
--
Andrew Gallagher
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________
openpgp mailing list
openpgp(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/openpgp