oh. I just realised my suggestion has the same shortcoming as the
variables syntax: it won't raise syntax errors, since it only matches
well-formed character sequences.
here's my amended 2.4.2.4, which attempts to fix it. the wording is a
bit weasely, but I think it will work in practice. if anyone can fix it
more formally in ABNF, please help out.
I hope it addresses the issues brought up (6 digit Unicode, single pass,
NUL, extension name). thanks for the feedback, everyone!
--
Kjetil T.
--- draft-ietf-sieve-3028bis-09.txt 2006-10-06 01:47:33.989869000 +0200
+++ draft-ietf-sieve-3028bis-kjetilho.txt 2006-10-21 21:52:27.461172000
+0200
@@ -393,6 +393,10 @@
invalid data and in arguments containing raw MIME parts for extension
actions that generate outgoing messages.
+ The extension "encoded-character" may be used to encode arbitrary
+ characters as a sequence of US-ASCII characters (see 2.4.2.4 for
+ details).
+
For entering larger amounts of text, such as an email message, a
multi-line form is allowed. It starts with the keyword "text:",
followed by a CRLF, and ends with the sequence of a CRLF, a single
@@ -470,6 +474,46 @@
valid, but need not ensure that they actually identify an email
recipient.
+2.4.2.4. Encoding characters using "encoded-character"
+
+ When the "encoded-character" extension is in effect, character
+ sequences in strings which match the encoded-seq syntax are
+ replaced by the decoded value. This matching happens after escape
+ sequences are interpreted and dot-unstuffing has been done. A
+ single pass is done.
+
+ encoded-seq = "${" enc-method ":" enc-argument "}"
+ enc-method = "hex" / "unicode"
+ enc-argument = hex-list
+ hex-list = hex-group *(WSP hex-group)
+ hex-group = 1*6HEXDIG
+
+ Arbitrary octets can be embedded in strings by using the encoding
+ method "hex". The sequence is replaced by the octets with the
+ hexadecimal values given by each hex-group. Values greater than
+ 255 ("ff") are a syntax error.
+
+ It may be inconvenient or undesirable to enter Unicode characters
+ verbatim, and in these cases the method "unicode" can be used. The
+ sequence is replaced by the UTF-8 encoding of the specified Unicode
+ characters, whose code points are identified by the hexadecimal
+ value of each hex-group.
+
+ Values for enc-method or enc-argument which don't match the above
+ syntax SHOULD cause a syntax error. Implementations SHOULD support
+ encoded NUL octets.
+
+ The capability string for use with the require command is
+ "encoded-character".
+
+ In the following script, message A is discarded, since the
+ specified test string is equivalent to "$$$".
+
+ Example: require "encoded-character";
+ if header :contains "Subject" "$${hex:24 24}" {
+ discard;
+ }
+
2.5. Tests
Tests are given as arguments to commands in order to control their
@@ -1075,7 +1119,7 @@
Implementations MUST support the "keep", "discard", and "redirect"
actions.
- Implementations SHOULD support "fileinto".
+ Implementations SHOULD support "fileinto" and "encoded-character".
Implementations MAY limit the number of certain actions taken (see
section 2.10.4).
@@ -1561,6 +1605,12 @@
RFC number: this RFC (Sieve base spec)
Contact address: The Sieve discussion list
<ietf-mta-filters(_at_)imc(_dot_)org>
+ Capability name: encoded-character
+ Description: changes the parsing of strings to allow arbitrary
+ characters to be embedded
+ RFC number: this RFC (Sieve base spec)
+ Contact address: The Sieve discussion list
<ietf-mta-filters(_at_)imc(_dot_)org>
+
Capability name: comparator-* (anything starting with "comparator-")
Description: adds the indicated comparator for use with the
:comparator argument