ietf-mta-filters
[Top] [All Lists]

Strawman for filtering language

1997-01-16 00:21:00
This is a strawman for a filtering language that I hope to implement in
the near future for the Cyrus IMAP server.  It is not Turing complete, but
I believe it fits most of the requirements in the previous language.

I am aware of deficiencies in this, and I intend to correct them in the
near future.  A partial list:

- this is poorly written, and it's just a grammar with comments on what
  stuff does.  I will replace this with something more solid in the very
  near future.

- the grammar is bad.  Whitespace is implied in this draft; this will be
  fixed very soon.

- There is no negotiation method for an extension mechanism.  This needs
  to be implemented in some sort of syntax checker for the language, as
  I would expect filtering scripts to sit in the user's home directory,
  or on a client machine, or an ACAP server, none of which provide an
  obvious way to negotiate extensions.  I suspect that in order to check
  the syntax of the language, with extensions, there will be a submit
  and/or verify program that will check the syntax of a language and
  place it wherever the filtering computer (client or server) will expect
  it.

- This is designed for server filtering, but does not preclude client
  filtering.  Client filtering can probably be more detailed, as it's not
  as dependant on the server's CPU time.

Any comments will be appreciated.  I hope to rewrite the document into
some better form in the near future.

Tim


Rough notes on filtering language

This is intended to be torn apart.  Please feel free to do so.  Before this
is implemented, more complete documentation will be done on it --
including, but not limited to, a more readable description.

This language is not Turing complete.  There are no variables to bind, and
there is no construct to loop.

This language is a collection of keywords, all of which take fixed
numbers of arguments.

Whitespace is not important (as long as atoms, keywords, and strings are
all seperated by at least one space) except in multi-line stuff, and
keywords are not case sensitive.  Some special character might be useful
at the end of cases, but I don't think it's necessary.

Unrecognized conditions evaluate to false, unrecognized actions file
into INBOX.  A syntax error in the script causes an error message to be
"replied" to the user and everything to be filed into INBOX.

Comments are of the form "#" ... LF, and can occur anywhere.

The notation resembles that in the IMAP spec, sort of... (I know I've
omitted "SPACE" many, many times.)

There is no notion of variables, and I had deliberately worked around it.
There are substitutions availible inside mail messages.

Very open issues:
* Regular expressions are used, but not defined.  While many casual users
  are very mistified by regular expressions (as used by the existing
  filtering language in CMU's legacy system), more experienced users
  very much want them.
* There is no mechanism for specifing extensions.  There is no way to
  ensure that site-defined values will be enforced; the user can't be
  warned in advance.  
* If anything goes wrong, file into INBOX.
* This document needs to be fleshed out; if the general concept is
  sufficient, I'll do so immediately.

-----------------------------------------------------------------------------
example:

when (to -regexp "tjs(_at_)(_dot_)*andrew\(_dot_)cmu\(_dot_)edu" or to -regexp 
"breakout(_dot_)*(_at_)cmu\(_dot_)edu")
     header subject "MONEY" header ("to" "cc") "list"
        then fileinto foo fileinto INBOX.bar fileinto INBOX.baz reply
        -days 1 message
Your message has been logged and recorded.
.
when header -case "to" "humor(_at_)mit(_dot_)edu"
        fileinto INBOX.humor
when header -case "subject" "foo"
        then forward "tjs(_at_)club(_dot_)cc(_dot_)cmu(_dot_)edu"
otherwise fileinto INBOX

-----------------------------------------------------------------------------

script ::= [1*case] [default]

        ;; A script consists of a number of cases, followed by a default case.
        ;; Vaguely similar to a set of if-else if-else in C, or maybe a cond
        ;; form in LISP.

case ::= "when" conditon "then" action

        ;; If the condition is true, action happens, and the processing ends
        ;; right there.  If not, continue to the next condition.

        ;; I like the word "if" better than "when", but that might suggest
        ;; that one or more of the cases below the current case might also
        ;; happen.  Perhaps change "then" to "do"?

default ::= "otherwise" action

        ;; If no case fits, we reach the end of the script.
        ;; Default action, if none supplied, is always to file into INBOX.

action ::= 1*[fileinto / reply / forward / toss / extension-action]

        ;; There are only four actions.  A user can file the message into some
        ;; number of mailboxes, he can reply to the mesasge, or he can forward
        ;; the message, or he can forward the message.  A user can only have
        ;; one automated reply or one forward, maybe both, but he can file the
        ;; message into as many of his mailboxes as he likes.

        ;; The end of the action can be found by looking for the next
        ;; case or the default case.  (Maybe it would be better to end this
        ;; with an LF)

atom ::= *char

        ;; where char is any character other than " LF CR ( ) etc...
        ;; same as IMAP?

condition ::= and / or / header / body / sizegt / sizelt / to / from /
                not / extension-condition / "(" condition ")"

        ;; A condition can consist of one or a few cases.

address ::= string

        ;; Any email address.

and ::= condition ["and"] condition

        ;; And binds tighter than OR.  A long list of possibilities can be
        ;; given with AND.

body ::= "body" [grepopts] key

        ;; Body is one of the contitionals; it searches the textual body of
        ;; the message for key.

continue ::= "continue"

        ;; continue processing after this case (fall-through).
        ;; by request only.

extension-condition ::= atom *(atom / string / multi-line)
        ;; atom not permitted to be "and", "or", or "then"

extension-action ::= atom *(atom / string / multi-line)
        ;; atom not permitted to be "and", "or",
        ;; "when", "otherwise", "fileinto", etc.

fileinto ::= "fileinto" <foldername>

        ;; Files a message into foldername.
        ;; (User MUST own foldername.  If name not prefixed with INBOX,
        ;; defaults to INBOX.name) [right?]

        ;; Temporary failures -- over quota problem --?

forward ::= "forward" address

        ;; Forward the message to address.

from ::= "from" [grepopts] <key>

        ;; Match envelope "from" field.  (Match individual envelope elements?)

from-file ::= "from-file" whitespace string

        ;; for reading messages from files.
        ;; some servers may not share a filesystem with the client.
        ;; this should probably be a URL that the server can get to.
        ;; (i.e., ACAP)

grepopts ::= "-regexp" / "-case" / "--"

        ;; Grepopts is a set of options for search commands (header, body,
        ;; to, from) that do things vaguely similar to what grep does.

        ;; -regexp specifies that the string supplied as the key should be
        ;; treated as a regular expression and not as a plain string.

        ;; -case forces the search to be case sensitive.

        ;; -- indicates no more options.

header ::= "header" [grepopts] <headername> <key>

        ;; Header searches headername for key, and is true if it finds it.
        ;; If a header is missing, some default value is assumed.

headername ::= "(" string 1#( SPACE string ) ")" / string

        ;; Possible to specify list of headers.

include ::= "include" string

        ;; include some file, denoted by the string.
        ;; file needs to be some server-defined value.

key ::= "(" string 1#( SPACE string ) ")" / string

mailbox ::= atom / string

        ;; Any valid IMAP mailbox name.

message ::= "message" LF *char "." LF

        ;; I have stuck with LF here because UNIX is more comfortable dealing
        ;; with that.
        ;;
        ;; Additionally, the following sequences are defined to
        ;; be expanded when a message is sent:
        ;;
        ;; %envelope-to% expands to to address from envelope.
        ;; %envelope-from% exands to from address
        ;; Any value matching %header=FOO% is expanded out to the
        ;; FOO header.
        ;; Any value matching %allheaders=FOO% expands to all
        ;; headers matching FOO.  FOO must be exact.
        ;; size is defined to be the message size
        ;; %% expands to %
        ;;
        ;; this may need to be changed to something else...
        ;; Additionally, there needs to be a real extension
        ;; mechanism for adding in new substitution values.

multi-line ::= message / from-file

not ::= "not" condition

notify ::= "notify" mailbox multi-line

        ;; This is for Chris's suggestion on things like
        ;; > when sizegt 2000
        ;; >        then notify INBOX
        ;; > Huge message (%size%) from %env-from% discarded on %date%:
        ;; > 
        ;; > %headers%
        ;; > .
        ;; although the variable names have been changed.
        ;; 
        ;; notify drops a message into the user's primary mail box.

number ::= nonzero-digit [1*digit] ["k" / "m"]

or ::= condition "or" condition

        ;; And binds tighter than OR.  OR could be used to generate some
        ;; long list of possibilities.

reply ::= "reply" ["-days" number] [-headers multi-line] multi-line

        ;; -days specifies minimum repeat interval.
        ;; There should be minimum and maximums on -days.
        ;; -days 0 should be illegal, as should -days 1M
        ;; (site-defined)

        ;; Headers may be supplied, but defaults are assumed.

sizegt ::= "sizegt" <number>

        ;; True if size is larger than number.

sizelt ::= "sizelt" <number>

        ;; True if size is less than number.

        ;; There are no numerical variables, so this has to evaluate a
        ;; parameter.  These are the only things that look at numbers.

string ::= \" *char \"

        ;; A string is a sequence of characters delimited by quotes.
        ;; * C escape sequences?  (Quote in strings might be useful)
        ;; * CR/LF/TAB in strings?

to ::= "to" [grepopts] <key>

        ;; Matches envelope to field.  (Match individual envelope elements?)
        ;; In the absence of a to clause, it implicitally uses to ""

toss ::= "toss"

        ;; Kills the message.







<Prev in Thread] Current Thread [Next in Thread>
  • Strawman for filtering language, Timothy J Showalter <=