ietf-asrg
[Top] [All Lists]

RE: [Asrg] 4. Survey of Solutions - Consent Model

2003-07-14 21:58:52
CONSENT - an expression of wanting to receive specific email 
LACK OF CONSENT - an expression of not wanting to receive 
specific email or absence of prior CONSENT

I think that at various stages of incoming message triage there are various 
things that need to be done, and there is at least some natural ordering of 
those operations that ought to take place.

First and foremost, I think there needs to be things that are strictly 
envelope/header related.  There might be some specific and familiar senders 
whose messages are NEVER to be delivered, or whose messages are ALWAYS going to 
be delivered (although perhaps with subsequent modifications!), and there might 
be some messages (many, in practice) which can't be decided or handled on the 
basis of header alone.  Sometimes individual message header lines will need 
examination and qualification.

Second, messages often consist of multiple parts.  Those parts can require 
individual, further attention... decoding, name and format testing, content 
scanning, or whatever.  In some cases, a multipart message needs to have one or 
more parts removed;  it's even possible that an originally multipart message 
might be reduced to a single-part message before releasing the message to the 
mail client software or ongoing MTA.  

Third, both within individual parts and within entire messages I think there 
needs to be provision to call external processing modules (which might be user 
or corporate-written) to provide additional decision-making options.  These 
might be individual batch-type processes, or DLLs, or Web services, or any of a 
variety of other technologies.  The important thing is that the processing of 
messages can be customized by user-written code in a multitude of ways.

I agree with you that CONSENT has not been defined properly, I 
am wondering 
how we should redefine it properly.Maybe something like this:

CONSENT - an expression of wanting to receive email from a 
specific SENDER 
LACK OF CONSENT - an expression of not wanting to receive email from a 
specific SENDER or absence of prior CONSENT for that SENDER

I think that "LACK OF CONSENT" needs to be further qualified with DENIAL (an 
expression of NOT wanting to receive mail from a specific, known sender) as 
opposed to simple "not (yet) authorized".

However, we need to take into account filters which check not 

[only just]

for specific senders, but rather for specific types of email. Perhaps the two 
definitions above should be combined.

In our R&D of Message Sniffer we've developed a model for consent which
fits very well in this discussion. I recommend that the ASRG adopt this
generalization of our model for defining consent:

There seem to be really 4 cases, so perhaps CONSENT should be defined
within these 4 cases:

1. CONSENT - a direct expression of wanting to receive email from a
sender.

2. SOFT CONSENT - an indirect expression of wanting to receive email
from a sender.

I'm not sure I understand here (yet?) the point being made with "direct" versus 
"indirect" on these first two cases.  Is #2 a "default - unrecognized sender" 
situation and #1 a known sender from which mail is ALWAYS desired (i.e. 
"whitelist")?

3. SOFT DENIED CONSENT - an indirect expression of wanting not to
receive email from a sender.

4. DENIED CONSENT - a direct expression of wanting not to receive email
from a sender.

Likewise...?

NOTE: 2 and 3 are required to handle anonymous senders.

I think that 2 and 3 also might be required to handle cases where a qualitative 
decision is made (i.e. messages not specifically "whitelisted" or 
"blacklisted").

But there are further cases that you're not considering here, and that is at 
least equally important.  It's not just whether a message is actually desired 
for delivery, but also how to handle cases where a message is NOT to be 
delivered to the original addressee.  Is the message to be simply blackholed?  
Is a forged "destination mailbox unknown" reply to be returned?  Is a polite 
reply to be returned requesting that the message be resent without attachments 
or HTML or encoding or whatever?  Is the message simply to be bounced back?

In the above also:

Direct Expression = an explicit white rule or black rule.

Indirect Expression = expression by the evaluation of some mechanism
chosen by the recipient including any kind of filtering engine.

Ah, okay.

Sender = defined by any combination of Sender IP or email route, 

Might it be useful to provide for coherency of message routing?  e.g. a message 
with a From: address of HOTMAIL.COM or AOL.COM but which has passed through a 
mail server or relay (say) in China or Korea or some other distant country?

Or, say, a message with wildly incoherent dates in Received: headers or Date: 
header?

Sender address, or other Authentication mechanisms. The recipient may define
any of these that may be required to define a particular sender, and may
also define which authentication mechanisms (if any) are acceptable.

Note that this might not be only based on the headers.  In particular, one 
might 
require that a message from a certain original sender MUST be signed with that 
sender's specific sig file or PGP key or something in order to be considered 
"authenticated".

Certain senders (say, specific Yahoogroups mailing lists) for instance might 
NEVER legitimately contain attachments.  If someone spoofs a popular 
Yahoogroups 
mailing list as the "sender" but actually still sends an attachment (which that 
group should NEVER be sending) then that's evidence of a forged From: address 
and/or an unauthenticated sender.

If all of the above are acceptable then a policy of consent could be
established and utilized in a very clear way:

FIRST: Each inbound message is evaluated first against the sender policy
to define "Sender" for the sake of evaluation. This definition may
include the "Unknown Sender" which would limit policy evaluation to
"SOFT" policies (2 and 3 above). Note also that "authentication
mechanisms" may be defined by the recipient to support DNSbl or other
services such as Bonded Sender mechanisms, or any other mechanisms that
may arise.

SECOND: The message is then evaluated against the consent policy to
define the case that matches the message. This includes (in case 2 and
3) the application of any evaulation mechanisms that the user may define
to evaluate the content of the message or evaluate it's other
characteristics.

THIRD: A specific action mapped to the identified case in the policy
should be executed. For example, reject the message, submit the message
to some process, redirect the message to some mailbox, some combination
of actions.

As SOFT evaluations can be difficult to quantify and must be open to new
mechanisms that become available in future I recommend that a "Consent
Definition Language" be developed that provides for specific actions
based on the evaluation of the message against the policy, and that in
SOFT cases (2 and 3) the "Consent Definition Language" be extensible to
take into account results that may be returned from the soft evaulation
mechanisms.

In the ultimate case of this, you really end up wanting to write a program... 
do 
we need yet another programming language?  Personally, I'd favor the use of 
SPITBOL for stuff like this... it's probably about the most powerful language 
there is for text processing and pattern recognition and data structure 
manipulation... and that's what this whole process is really all about.  
SPITBOL 
has the additional nice property that one can bring in new program segments 
dynamically according to stages already passed or decisions already made... so 
that the program can be easily extended at runtime to add new rules or 
whatever, 
based on (say) specific senders or specific types of message content.

It's hard to imagine how one would devise a specific "Consent Definition 
Language" that doesn't end up being in essence a "programming language" (and 
that's NOT a bad thing, necessarily, but I'd hate to see the "standard" set to 
use a specific language, especially if that language ends up being less 
satisfactory for this use than something already existing such as SPITBOL or 
whatever).

For example, some "tests" may return weights, others may return
probabilities, others may return categories of content, others may
return specific heuristics that fail.

Sure.  And some processes might need the intermediate results resulting from 
prior processing.

Others might be independent, and in that case it would also be nice to allow 
multiple tests on the message to perhaps proceed in parallel 
(multitasking/multiprocessing/whatever) so as to reduce overall time spent in 
processing each message.

However, in general there structure of a working policy model tends to
be  hierarchiacal so an XML based framework can be very efficient. 

I've heard XML called a LOT of things although I don't think that I can 
remember 
too many times that it was accused of being "efficient".  :-)

Just because XML is presently trendy is NO reason IMHO to impose that degree of 
overhead onto anything as core as E-mail processing.  In another list I'm on 
we've been discussing the XML overhead/performance issues and found that a 
typical situation results in XML record descriptors taking twice as much time 
(or more) than a simple delimited data representation.  (And I think that it 
often can take a LOT more than even that...)

As XML is naturally extensible then so would a "Consent Definition Language
(CDL)" based on XML.

Honestly, there is nearly NOTHING that is especially more "extensible" about 
XML 
than there is with MANY other data representations.

I have a number of objections to XML in principle, largely based on the fact 
that individual field names have to be parsed again and again for EVERY record 
handled... and not just field names, but also the higher-level issues of 
ensuring that every required field is present for every record...

In the interests of sharing, proofing, and evaluating CDL based policies
there should be a mechanism for defining standarized representations of
tests. 

Ultimately, again, this is going down a slippery slope to defining a new 
programming language... and I simply don't think that's necessary here.  There 
are a number of languages that could be used, from primitive languages like C 
to 
braindead RegEx-based things like Perl or "real" pattern-matching languages 
like 
SNOBOL or SPITBOL.

Ultimately, I suspect that individual implementations are likely to be done in 
whatever language the implementor is most comfortable with.  That's perhaps the 
way it should be.  Despite the fact that I *personally* think that SPITBOL 
would 
be *wonderful* for writing stuff like this, I recognize that a lot of people 
aren't familiar with it and would probably pick Perl or something instead just 
because that's all they know.  

I'm not even sure, really, that we have to go all that far in terms of defining 
what the actual consent definition language or corresponding data 
representations are... I'm not all that convinced that we'll ever see (or even 
that we SHOULD) a single standardized worldwide agreement for stuff like this, 
and different mail filtering systems and tools are likely to develop their own 
approaches and techniques.  (And if someone does a distinctly "better" one, 
hopefully it will win out even over a "standard" one.)  

For example, specific DNSbl tests that are "well known" may have names
names defined for them where those names would be adopted by the
community in the same way well known ports are adopted for services. 

I think it probably makes more sense to simply provide a mechanism (or better, 
several) for calling external processing units.  Then the script (or whatever) 
can add whatever steps a person wishes.

Again, though, a lot of these are implementation issues within a particular 
filter;  I'm not sure we have to produce anything to that level.

Any such namimg conventions should be an enhancemnet rather than a
requirement in the CDL. For example, if a recipient wishes to leverage a
DNSbl (or other service) that does not have a "well known name" then the
definition of the test in the policy should be clear and consistent and
no more difficult to implement in the CDL than any other DNSbl. 

Similar guildelines should be in place for the implementation other SOFT
mechanicsms that might be used for: authentication (defining the
sender), or developing SOFT CONCENT (such as filtering systems such as
Message Sniffer, Spam Assassin, Bogo Filter, and others...)

Good examples of external procedures which might be used within the consent 
model.

Based on personal experience, the framework defined above _should_ be
able to encoumpass all of the current and proposed mechanisms used for
curbing abuse without significant difficulty or complexity.

Perhaps, although it sounds awfully complex to me (and specifically I really 
don't see why we need to jump onto the currently-trendy XML bandwagon here).

The real issue, I think, is how far we're going to go toward writing the actual 
filtering application as part of the consent model standard (and even, for that 
matter, whether we NEED a standard consent description).  

Even just simple "whitelists" or "blacklists" don't always tell the story... 
for 
example, I might have a Yahoogroup I'm a subscriber to but that group (which I 
might whitelist) should **never** send me a message containing (say) an 
executable attachment.  If it does, I definitely want to (at a minimum) trash 
the untrusted attachment.  

Likewise, the mere presence of a blacklisted domain reference in a message may 
not be enough to justify t-canning the message... for instance, the messages I 
get from the suespammers.org domain might refer to a particularly heinous 
spammer or quote from one of their E-mail spams, and I wouldn't want to t-can 
the message just because of that.

I guess I personally feel that what we need to do more is to establish that 
there are certain broad areas that will typically be used to perform triage on 
incoming E-mails, whether at the user level or at the domain or ISP service 
level.  These areas include header-level coherency and tests (acceptable user 
identity, no routing through known open relays, etc) as well as content-based 
tests (no HTML-burdened content, no obscured URLs, no bogus HTML tags, no 
obscured content tricks, no embedded images, no known-disreputable URLs or 
domains or IP addresses, no attachments (or maybe no executable attachments) 
etc 
etc) and that we need to provide for different sender-specific rulesets for 
specific authenticated familiar senders, specific familiar disreputable 
senders, 
and unfamiliar senders.

I still think that it is absolutely essential that HTML-burdened content (or at 
least large classes of frequently-abused HTML) and presence of attachments or 
encoded message text should be offered as an optional (and probably 
recommended!) cause for denial of delivery of messages from unfamiliar senders.

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support the Anti-SPAM Amendment!  Join at http://www.cauce.org/
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg