Re: "body" extension


On Fri, 2002-06-14 at 17:33, Jutta Degener wrote:


On Fri, Jun 14, 2002 at 03:09:18PM -0600, Tim Showalter wrote:

Doing this right (charset decoding with comparators) is hard, but is the
right thing.


I think that's the right thing for some applications,
but not for others.


I want to address this but I want to answer everything else first.

I'm thinking about the application you describe in the
form of another extension I'm calling "text".  (You're not
the only person who wants this to be part of "body" and
triggered with a flag or by its absence; I'm resisting this
mostly because I want to get "body" out to closely resemble
existing practice, and I haven't yet heard that this kind
of behavior is existing practice.  Please set me straight if
you know differently!)


Codifying existing practice would be fine with me, but my problem is
that the particular piece of real estate that would be the "body"
keyword seems valuable to me.  What we're describing here is the
uncommon case where you want to go mucking with the raw data of the
message.  It's easy to implement, but when a user goes and tries to use
it in practice, they're not going to be able to decode CTEs.

We have a "header" keyword; it searches decoded headers.  The keyword to
search CTE-decoded bodies ought to be "body".

I don't know what anyone's x_body extension does, unfortunately, but I'd
be a lot more comfortable if we called this rawbody or body :raw.

(There could be a third in the canon, working-title "content",
which transfer-encodes, but doesn't go any further.  That's
probably what you want if you are trying to implement a virus
scanner that looks for signature strings in binary files.)


That's an interesting idea as well.  I'm not convinced Sieve is the
right place to do virus scanning, especially given the prevalence of
Microsoft Word macros and deliberately obfuscated code, though.

I do not believe this is the right form for "body", unless there's a
required tagged argument ("body :raw").  If someone asks for "body",
this probably isn't what they want.


What I would like to know - and I think you have that information,
but you haven't put it into your reply - is what's _actually implemented
right now_.  What are the syntax and semantics of your existing
"body" command?


I don't have a Sieve implementation, so that's not all that important. 
But what I implemented has very similar syntax to what you described. 
It's really easy to implement.  I got away with it for a few releases. 
But it will be difficult to change the semantics of body from "search
raw CTEs and MIME headers" to "search decoded CTEs and skip MIME
headers", as we won't have a suitable replacement.

Actually I don't even know which implementations do x_body, but I have
seen it used in example scripts.

I think that's the right thing for some applications,
but not for others.


I agree a raw body search is useful.  It's even easy to implement in a
lot of environments (mmap makes this just trivial), and I think a lot of
implementations would immediately start supporting it for those reasons.

I think there are three applications for a body search: (1) text
searching in text parts; (2) searching raw MIME because Sieve doesn't
provide the tools for it; (3) searching non-text parts for non-text
data, what you're calling "content".  I hadn't thought about (3) before.

A raw body search is insufficient for all of these, (1) and (3) because
of CTE issues, and (2) because it can be fooled too easily.

There are places where a raw body search will be useful, but I'd like to
see it called something else.  Supporting uses (1) and (3) well would be
valuable.  Use (2) would be good, but I think the right way to do it is
to provide MIME-matching functions.

If there are other specific applications I'd like to know about them.  I
am not sure that there's a good way to generalize this stuff in terms of
the interface.

Tim