ietf-mta-filters
[Top] [All Lists]

Re: Last Call: Sieve -- Subaddress Extension - NUL

2003-04-04 17:12:21

On Fri, Apr 04, 2003 at 08:39:55AM -0000, michael(_at_)freenet-ag(_dot_)de 
wrote:

Michael, I believe that we are not talking about the same thing, and 
there's nothing to worry about, and no change to the RFC needed.

I think you misunderstood me.  I just jumped into the list and I am
talking about how to match this:

  Subject: =?iso-8859-1?q?=00text?=

Sieve can't do that right now other than by using a pattern match.

I'm not convinced sieve needs to address this.

From RFC 2047, section 5, use of encoded-words in message headers:

#  Only printable and white space character data should be encoded using
#  this scheme.  However, since these encoding schemes allow the
#  encoding of arbitrary octet values, mail readers that implement this
#  decoding should also ensure that display of the decoded data on the
#  recipient's terminal will not cause unwanted side-effects.
#
#  Use of these methods to encode non-textual data (e.g., pictures or
#  sounds) is not defined by this memo. [...]

I'd conclude that by using RFC 2047 to encode NUL bytes, you're
sending bogus data.  (It isn't printable or white space.)

I don't think the sieve language needs to be extended to address
the specific details of bogosity, no more than it should be extended
to deal with e.g. octet sequences that don't occur in UTF-8 at all,
yet could be encoded in a header.

Sieve has made a deliberate design choice to deal with pieces of
text as logical UTF-8 strings (and not as physical sequences of
octets).  I think you're unhappy with that design choice, but I
don't think you've identified a flaw in its implementation.

--

While we're talking about binary data:

Over lunch at the last IETF, I got the sense that people wanted
a way of executing binary comparisons on MIME part contents, mainly
to roll their own virus checking (in spite of the overlap
with a "virustest" command).

I said then that I'd like to have a "hex" comparator that can
compare binary strings as part of the "body" command.  Due to
the fact that sieve works with UTF-8 strings, that just didn't
work out as a comparator in the sense of "i;octet"; instead, the
next version of "body" will have a new :binary match-type,
a version of "content" that works in the hex domain.

Here's what that text looks like so far:

| 4.4 Body Transform ":binary"
| 
|    If the body transform is ":binary", the rules for selecting MIME
|    body parts for matching are the same as with the ":content"
|    body transform.
| 
|    MIME parts encoded in "quoted-printable" or "base64" content
|    transfer encodings MUST be decoded prior to the match.
|    MIME parts in other transfer encodings MAY be decoded, omitted
|    from the test, or processed as raw data.
| 
|    Unlike in :content, the charset of the :binary MIME content is
|    disregarded.   Instead, the match against the keys provided in
|    the "body" statement proceeds as if the file's content data had
|    been translated into space-separated hex bytes of the form
|    [0-9a-f][0-9a-f] prior to matching.
| 
|    Search expressions MUST NOT match across MIME part boundaries.
|    MIME headers of the containing text MUST NOT be included in the
|    data.
| 
|    If the optional ":offset <start: number>" is provided, the
|    binary match is executed after skipping <number> octets of
|    the binary data.  (Note that the offset counts bytes of the
|    internal data, not characters of the hexadecimal representation.)
| 
|    Example:
|       require ["body", "fileinto"];
| 
|       # Save any message with any application MIME part that contains
|       # an ascii C string representation of "Hello, World!" into the
|       # "helloworld" folder. 
|       
|       if body :binary ["application"]
|               :contains "48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 00"
|       {
|               fileinto "helloworld";
|               stop;
|       }
| 
|       # Check if the bytes at offsets 1000...1003 match some fictitious
|       # signature 44, 3c, 0, 1; if yes, reject the message.
| 
|       if body :binary ["application"] :offset 1000 :matches "44 3c 00 01 *"
|       {
|               reject "example virus detected";
|               stop;
|       }

Originally, I didn't have spaces between the hex values, but
that meant that "c2" would match in "0c20" unless I made up a
new strictly pairwise comparator as well -- and some things are
just too clumsy to describe.  Besides, I think it actually looks
better this way.

There was also no mention of :offset over lunch, and it may be
that I'm going too far and should leave that to :regex instead;
but it did seem common enough to warrant an explicit mechanism.

Jutta Degener <jutta(_at_)sendmail(_dot_)com>

<Prev in Thread] Current Thread [Next in Thread>