Re: Proposal for escaping on non-UTF-8 sequences in Sieve

oh.  I just realised my suggestion has the same shortcoming as the
variables syntax:  it won't raise syntax errors, since it only matches
well-formed character sequences.


here's my amended 2.4.2.4, which attempts to fix it.  the wording is a
bit weasely, but I think it will work in practice.  if anyone can fix it
more formally in ABNF, please help out.

I hope it addresses the issues brought up (6 digit Unicode, single pass,
NUL, extension name).  thanks for the feedback, everyone!

-- 
Kjetil T.

--- draft-ietf-sieve-3028bis-09.txt     2006-10-06 01:47:33.989869000 +0200
+++ draft-ietf-sieve-3028bis-kjetilho.txt       2006-10-21 21:52:27.461172000 
+0200
@@ -393,6 +393,10 @@
    invalid data and in arguments containing raw MIME parts for extension
    actions that generate outgoing messages.
 
+   The extension "encoded-character" may be used to encode arbitrary
+   characters as a sequence of US-ASCII characters (see 2.4.2.4 for
+   details).
+
    For entering larger amounts of text, such as an email message, a
    multi-line form is allowed.  It starts with the keyword "text:",
    followed by a CRLF, and ends with the sequence of a CRLF, a single
@@ -470,6 +474,46 @@
    valid, but need not ensure that they actually identify an email
    recipient.
 
+2.4.2.4. Encoding characters using "encoded-character"
+
+   When the "encoded-character" extension is in effect, character
+   sequences in strings which match the encoded-seq syntax are
+   replaced by the decoded value.  This matching happens after escape
+   sequences are interpreted and dot-unstuffing has been done.  A
+   single pass is done.
+
+   encoded-seq         = "${" enc-method ":" enc-argument "}"
+   enc-method          = "hex" / "unicode"
+   enc-argument        = hex-list
+   hex-list            = hex-group *(WSP hex-group)
+   hex-group           = 1*6HEXDIG
+
+   Arbitrary octets can be embedded in strings by using the encoding
+   method "hex".  The sequence is replaced by the octets with the
+   hexadecimal values given by each hex-group.  Values greater than
+   255 ("ff") are a syntax error.
+
+   It may be inconvenient or undesirable to enter Unicode characters
+   verbatim, and in these cases the method "unicode" can be used. The
+   sequence is replaced by the UTF-8 encoding of the specified Unicode
+   characters, whose code points are identified by the hexadecimal
+   value of each hex-group.
+
+   Values for enc-method or enc-argument which don't match the above
+   syntax SHOULD cause a syntax error.  Implementations SHOULD support
+   encoded NUL octets.
+
+   The capability string for use with the require command is
+   "encoded-character".
+
+   In the following script, message A is discarded, since the
+   specified test string is equivalent to "$$$".
+
+   Example:   require "encoded-character";
+              if header :contains "Subject" "$${hex:24 24}" {
+                    discard;
+              }
+
 2.5.     Tests
 
    Tests are given as arguments to commands in order to control their
@@ -1075,7 +1119,7 @@
    Implementations MUST support the "keep", "discard", and "redirect"
    actions.
 
-   Implementations SHOULD support "fileinto".
+   Implementations SHOULD support "fileinto" and "encoded-character".
 
    Implementations MAY limit the number of certain actions taken (see
    section 2.10.4).
@@ -1561,6 +1605,12 @@
    RFC number:      this RFC (Sieve base spec)
    Contact address: The Sieve discussion list 
<ietf-mta-filters(_at_)imc(_dot_)org>
 
+   Capability name: encoded-character
+   Description:     changes the parsing of strings to allow arbitrary
+                    characters to be embedded
+   RFC number:      this RFC (Sieve base spec)
+   Contact address: The Sieve discussion list 
<ietf-mta-filters(_at_)imc(_dot_)org>
+
    Capability name: comparator-* (anything starting with "comparator-")
    Description:     adds the indicated comparator for use with the
                     :comparator argument