base-spec issue #1: character escapes



Per discussion during the meeting yesterday, I'm fleshing out the
options on how to provide character escapes here on the list:

0) do nothing; just repeat that \x means x, period.
   PRO: - no changes to the spec
        - breaks no implementations...unless they don't conform already
   CON: - scripts have no way to get strings that contain NUL or
          invalid UTF-8

1) change the base spec to define \xFF, \uXXXX, etc
   PRO: - simplest to specify that adds the support
        - support is easy and consistent; no need to change string
          interpreters in mid-stream
   CON: - breaks all implementations
        - breaks scripts that use superfluous backslashes
        - escapes only usable in quoted strings
        - scripts that need escapes can't guarantee they're getting them
          (scripts would not be portable between versions)

2) change the base spec to say the \x maps to x unless overriden
   by an extension; extensions may redefine any \x except \\ and
   \".  Scripts SHOULD NOT contain extraneous escapes.  Then, create
   an extension which defines \xFF, \uXXXX, etc
   PRO: - neither implementations nor scripts broken by the change
        - script that needs escapes is guaranteed they're getting them
          if they're supported
        - implementation similar to variable (or is that a CON?)
   CON: - more complicated to specify
        - another extension has to be defined and used when needed
        - escapes only usable in quoted strings
        - does there need to be a registry for the redefinitions
          to prevent conflicts between such extensions?

3) define an extension to variables that implicitly creates variables
   (in a namespace) for each unicode codepoint and octet value whose
   values are the name codepoints/octets (e.g., ${unicode.00bf}
   would contain the UTF-8 representation of U+00BF (inverted
   question mark); ${octet.ff} would be the octet with value
   255, which is not valid UTF-8)
   PRO: - neither implementations nor scripts broken by the change
        - script that needs escapes is guaranteed they're getting them
          if they're supported
        - usable in both quoted strings and multiline literal
        - avoids introducing another area of extension (c.f. last
          CON of (2))
   CON: - more complicated to specify
        - more annoying/noisy to use
        - another extension has to be defined and used when needed
        - requires support for and use of variables

4) define an extension that covers all the changes in the base spec
   that are incompatible with RFC 3028.  Option (1) would be done
   under that extension.  If there are no other incompatible changes
   then this reduces to (2)
   PRO: - neither implementations nor scripts broken by the change
        - script that needs escapes is guaranteed they're getting them
          if they're supported
   CON: - medium complexity?
        - another extension has to be defined and used when needed
        - escapes only usable in quoted strings
        - negative experience with version numbers in IMAP
        - no longer revising Sieve base spec but rather defining Sieve v2

Are there other options?  Did I miss (or misstate) any PROs or CONs?
Which of these PROs and CONs should be considered important and why?


Philip Guenther
editor