[Asrg] 4d. Consent Framework - Protocols and Formats

Recently I posted a message on the topic of tests and actions for
XML-CPDL inspired by those defined by RFC 3028 for Sieve.

Here I suggest a possible syntax for these which could become
part of the XML-CPDL. I would welcome any comments and suggestions
from those interested in defining a consent policy definition 
language.

When referring to existing XML-CPDL syntax, I mean the example
syntax as shown at:

  http://www.sortmonster.com/ASRG/ConsentPolicyExample.xml

The elements in the header-matching syntax I propose below are
effectively predicates which return true or false for a given
message depending on whether they match any value specified
for such a header.

(i.e. they are true if and only if there exists at least one
  matching value in any header of that name)

Each value would be delimited by the usual RFC 2822 characters
and matched separately. So, to match a value in the header XYZ:

  <HEADER name="XYZ">
     <EXPRESSION>abc*</EXPRESSION>
  </HEADER>

Where "abc*" is some pattern-matching syntax. I'm currently
agnostic as to what pattern-matching system should be used.
Some people like regular expressions, others don't.

I think we should settle on a basic pattern-matching syntax
and then anyone who finds it to be inadequate can still use
external libraries as discussed earlier.

We will need a certain minimum level of matching to be widely
supported in order for policy sharing to be truly useful, which
is why Regex syntax is a good candidate since Regex libraries
are widely used and readily available.

To give a more concrete example, one can define tests like:

  <TEST id="IsAllegedlyFromAlice" method="StandardHeaderMatch()">
    <HEADER name="From">
      <EXPRESSION>alice(_at_)example(_dot_)com</EXPRESSION>
    </HEADER>
  </TEST>

I would suggest that address fields are treated as if they contain
just the raw addresses and not the user-readable name parts.

The absence of a specified EXPRESSION element could be taken to mean
that the existence of the header is sufficient for the predicate
to be true, e.g.

  <TEST id="IsCRI" method="StandardHeaderMatch()">
    <HEADER name="CRI-Sender-Exempt" />
  </TEST>

...would match a message with a CRI-Sender-Exempt header, regardless
of the value therein. This could be useful for CRI folks.

People like Gordon who might prefer SPITBOL matching (or any other
matching language of their choice) could specify tests like:

  <TEST id="IsSPITBOLMatch" method="DoSPITBOLRules()" />

...for some suitably-defined implementation on their local system.
Thus if you find Regex isn't enough for your needs then you're not
forced to rely on it.

Another example: people who want to block HTML messages could use a
test something like...

   <TEST id="IsHTMLMessage" method="StandardHeaderMatch()">
     <HEADER name="Content-Type">
       <EXPRESSION>text/html.*</EXPRESSION>
     </HEADER>
   </TEST>

You get the idea. Of course one test by itself isn't much use. We 
need to logically combine tests.

Under such a scheme one could combine multiple expressions within
the <TEST> definition above. This might be valuable to create large
complex test expressions which could be re-used in many policies.
The existing XML-CPDL doesn't mention this. Pete, could you comment
on this issue?

Another place to combine tests is under the <CONDITIONS> element of 
a policy. In this area, the example XML-CPDL policy comment states:

  "A CONDITIONS block in a policy definition combines a set of
   tests that describe the desired conditions surrounding a 
   given message. In general the individual TEST elements are
   combined with a logical AND unless otherwise defined. As a
   result, ALL of the tests must be true for the conditions to
   be true (by default)."

Now, to allow rules which can use a logical OR, I would propose a
new kind of element: ANYOF (as in "any of"). This is inspired by
the Sieve construct of the same name.

For an <ANYOF> block to be true, at least one of its tests must
evaluate to true for the message under consideration.

Consider the following example. (Note: I have omitted the enclosing
<CPDL>, <POLICIES>, <GROUP> and <POLICY> elements to aid 
readability)

 <CONDITIONS>
   <ANYOF>
     <TEST id="A" />
     <TEST id="B" />
   </ANYOF>

   <ANYOF>
     <TEST id="C" />
     <TEST id="D" />
     <TEST id="E" />
   </ANYOF>
 </CONDITIONS>

... would be equivalent to (A OR B) AND (C OR D OR E), where A to E
are tests defined previously. The AND occurs because of the earlier
assumption that elements are combined with a logical AND unless
otherwise defined.

This kind of structure allows the expression of rules in 
conjunctive normal form. In practice to make the XML schema rather
more rigid and hence easier to parse (not to mention less
ambiguous for human readers/writers) I would also propose an ALLOF
element which will only be true if all of the tests it encloses
hold true for the current message.

Thus the above example would be unambiguously written as:

 <CONDITIONS>
   <ALLOF>
     <ANYOF>
       <TEST id="A" />
       <TEST id="B" />
     </ANYOF>

     <ANYOF>
       <TEST id="C" />
       <TEST id="D" />
       <TEST id="E" />
     </ANYOF>
   </ALLOF>
 </CONDITIONS>

It is possible that a "not" element may be required for negation.
For consistency it might be better to name it NONEOF ("none of").

<NONEOF>...</NONEOF> would be the same as "NOT <ANYOF>...</ANYOF>"

It might add to the computational complexty slightly if such
elements were allowed to contain ANYOF and ALLOF elements. However
short-circuit evaluation could help speed things up.

There is a trade-off to be made between policy complexity and
execution speed. However, processing spam wastes plenty of CPU cycles
already, so some short-term pain should pay off in the end. (I'd hate
to see us define a hopelessly weak language just for fear of actually
doing some up-front work.)

OK, now onto ACTIONs. In Sieve there are options to:
 - bounce the message back with an error e-mail

 - store the message in a specific mailbox folder - not
   really meaningful outside of a MUA ("fileinto")

 - forward the message elsewhere, subject to loop detection
   ("redirect")

 - allow the message to be delivered ("keep")

 - silently delete the message with no bounce ("discard")

Aside from "fileinto" (which is aimed just at MUAs) all of these
actions could be implemented in either MTAs or MUAs. I would
therefore propose that all of those actions are required to be
implemented for compliance. Thus all of the following actions
could be perfectly valid in the <RESPONSES> as "well known"
actions:

  <ACTION id="Keep" />
  <ACTION id="Discard" />
  <ACTION id="Bounce" />
  <ACTION id="Redirect">targetaddress(_at_)example(_dot_)com</ACTION>

In the case of Bounce some more work is needed because one
might want to specify why the message is being bounced.

For example, even if the sender is whitelisted as being on my
list of friends (according to, say, some Choicelist or CRI test)
I might still want to reject executable attachments from them
and auto-send them a polite message to explain why I don't accept
those.

A crude example policy might be something like:

  <POLICY name="Politely bounce executables from friends">
    <CONDITIONS>
      <ALLOF>
        <ANYOF>
          <TEST id="SenderIsChoicelisted" />
          <TEST id="SenderIsCRI" />
        </ANYOF>

        <TEST id="HasExecutableAttachment" />
      </ALLOF>
    </CONDITIONS>

    <RESPONSES>
      <ACTION id="Bounce">
         Hi, thanks for your message. Unfortunately I don't accept
         program file attachments due to the risk of spreading 
         computer viruses. Please contact me if you need more
         information.
      </ACTION>
    </RESPONSES>
  </POLICY>

We've not discussed any tests for attachments yet but I think it
would be valuable to do so. For now I'll assume such a test exists.

In the case of anyone who didn't fulfill the criterion of being one
of my friends I'd probably just silently delete the message using a
policy like this:

  <POLICY name="Delete executables from unknown people">
    <CONDITIONS>
      <ALLOF>
        <TEST id="SenderIsUnknown" />
        <TEST id="HasExecutableAttachment" />
      </ALLOF>
    </CONDITIONS>

    <RESPONSES>
      <ACTION id="Discard" />
    </RESPONSES>
  </POLICY>

Thus random spammers would have no definite proof of my account being
valid. This would be my personal preference, although I'm not 
suggesting it should be done this way for anyone else. I just want a
CPDL which is expressive enough for people to put in place whatever
policy suits them.

I will later on propose some additional tests for:
 - checking attachment details
 - checking message sizes
 - checking for the presence of specific types of HTML content
   (e.g. Javascript, OBJECT embedding, invalid tags etc.)

There is the possibility of actions to strip certain types of 
attachment out of a message and deliver the messsage in a modified
form without those attachments. I believe this is what some mail
gateway systems do anyway, but it would be nice to standardise that
somewhat.

Also there could be actions to issue specific 4xx or 5xx errors
during the SMTP transaction: might be helpful for MTAs. There are
problems with fitting that approach into this model, but I'll get
to those issues later.

For now I'd like to get some comments before going on further.

Thanks for reading this far...

Andrew

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg
[Asrg] 4d. Consent Framework - Protocols and Formats - XML-CPDL