ietf-822
[Top] [All Lists]

Re: a header authentication scheme

2004-11-04 09:05:52

On Mon November 1 2004 23:36, Laird Breyer wrote:

On Nov 01 2004, Bruce Lilly wrote:

You have missed the points; at any point a party can
a) s/result-tag="spam"/result-tag="ham"/
b) insert a Received field, then insert a "processed" field referring
    to that Received field, with any desired "processed" field content
N.B. neither tactic would be effective if MIME security multiparts
were used.

Perhaps you have a concrete example where you think a) or b) occurs,

"Occurs" implies that this happens on a regular basis, and as we're
discussing a hypothetical field which has not been completely
defined, much less deployed, that term is inappropriate.

the only examples I can think of are compromised systems, for which
the problem is moot.

No. I have not ascribed any motives, but it should be easy to
read between the lines and figure out why such things might
happen for reasons other than "compromised systems", i.e. a
field might be intentionally forged by a party involved in
message transfer.  Case a was an example;
s/result-tag="ham"/result-tag="spam" should be the obvious
alternative example of what could happen.  Now parties
responsible for message transfer ought not to alter content,
but it does happen (and in fact the premise of the present
discussion is that that is to be encouraged by inserting these
"processed" fields hither and yon).  Ethical conduct would
preclude such tampering, but ethical (moral, if you prefer)
conduct seems to be the exception rather than the rule these
days (despite claims to the contrary by the unethical parties,
which is hardly surprising).  It is not difficult to envision a
commercial interest inserting a field declaring competitors'
messages to be spam. Likewise for political, religious, and/or
governmental organizations with a vested interest in curtailing
the free exchange of ideas.  In many parts of the world,
Internet Service Providers who routinely transport messages
are either commercial entities (often with a diverse set of
business interests, and therefore many competitors) or are
governmental entities.

a) requires modifying some existing Processed field. No well behaved
filter needs to modify existing fields in a message.

Since we're discussing spam and intentional disinformation,
we're obviously not talking about "well behaved" anything.
If everybody and everything under human control were
"well behaved", there wouldn't be a spam problem in the first
place.  You seem to be assuming that there is only one "bad
guy", and that everybody else is always a "good guy". Sorry,
reality isn't that simple, nor is it that black-and-white.

If a modification 
occurs, then the process doing this modification is doing something
prohibited (tampering with the message),

Agreed. So what? Individuals, companies, political parties,
religious organizations, and governments do prohibited
(and/or allowed but unethical) things all the time.  Don't
be nieve.

and the computer system which 
is running this process has therefore been hacked

Incorrect conclusion -- it may be intentional on the part of
the system owner/operator.

or belongs to an 
untrustworthy party.

Do you trust everybody involved in the transmission of every
message?  Why would you?

b) requires inserting a new Received field.

So what? De rigeur for SMTP handling and not prohibited
otherwise.

The sequence of Received 
fields added close to the message destination is predictable by the
recipient,

No. Certainly not so for non-SMTP transfer. And you have
provided examples of non-Internet systems where header field
order is indeterminate.

and can only be changed by an attacker if one of the 
computers in the recipient's organization is compromised.

No. First, the very last system to have inserted a Received field
might not be "in the recipient's organization"; it might belong
to a commercial or government-run service provider. Second,
as explained above, it is not necessary that such a system be
compromised; the change may be deliberate.
 
So we always have a model of message transport as follows:

A -> M1 -> M2 -> M3 -> ... -> T1 -> T2 -> ... -> B

Not necessarily.  There may well be multiple paths to B, N-1
of which do not pass through "T2".  "T1", "T2", etc. might
pass some messages w/o change, but may change some
fraction of messages (i.e. those messages not sympathetic
to the interests of the owner/operators of those systems).
 
A is the sender of the message, B is the destination, M represents
computers which are untrusted by B, and T represents computers which
are trusted by B and add a Received field.  None of the computers of
type T will ever perform either a) or b) above, unless they have been
compromised.

No, repeating "unless they have been compromised" doesn't
make it true. 

Computers of type M may well perform a) or b). It is up 
to B to only trust Processed fields associated with computers of type
T.

You are assuming an extraordinarily simplistic model which
simply doesn't conform to reality; a model where there are
only "M" and "T" and where each system is either one or the
other, invariant with message content and over time, with no
message transport paths through other mechanisms.

If the recipient's MTA doesn't add a Received field of its own (ie
there are no T type computers at all), then the recipient B must already know
not to trust any Received fields at all in a message.

Again you are ignoring multiple transport paths and assuming
a black-and-white trust model which is invariant over time and
with message content.

But are you really saying that a given user B
cannot, by examining a sample of past mail messages he personally received,
deduce the form of the Received fields added by the computers I've labeled
T above?

It certainly depends somewhat on B's knowledge and experience.
I would estimate that for > 99.9% of email recipients, a *valid*
deduction would be extremely unlikely; probably 99.5% have no
clue what a Received field is in the first place.  Judging by illegal
syntax in many generated Received fields, it's clear that a large
fraction of those who think that they know what a Received
field is are simply wrong.  On top of that, there are hostname
aliases, domain literals, etc. which require an additional
understanding of the DNS.

I must admit I find that hard to believe. 

Conduct a survey of a random sample of people who receive
email and convince yourself.
 
For a concrete example, suppose that some recipient
is interested in a home mortgage.  A spam filter that is configured
to treat all messages containing the word "mortgage" as spam is
worse than useless to that recipient; it is counter-productive. And
on the basis of the (lack of) information in your proposed
"processed" field, no recipient is able to determine whether or not
configuration of any alleged process is appropriate *for him*.

Obviously such a user will learn to not accept the supplied spam label
as authoritative. He will either ignore the Processed field's
recommendation completely, or configure his other filters if he has
any to lower the weight of that particular field so as to let through
mortgage mails but not other spam.

What "*other* spam"?  The premise is that for that recipient, that
message isn't spam.  "Other" doesn't apply.  How is the user
supposed to be able to differentiate those "processed" fields
from any others -- does he magically acquire an understanding
of Received fields and "processed" field by osmosis in his sleep,
is he able to compute hashes in his head, has he memorized the
entire worldwide DNS database and does he telepathically receive
updates?  Ignoring all that, suppose that the hypothetical
filter is 100% accurate for that recipient except for messages
containing the specific keyword in question.  In order to
determine the validity of any ostensible value judgment made by
that remote filter -- if the message is received in spite of the supposed
"filtering" -- , the user will have to apply a local filter to first
determine whether that keyword is present in the message.
If a local filter is therefore going to have to be used, there is little
point in having a remote filter, a black-and-white time- and
content-invariant trust model, "processed" fields, etc.
  
For example, statistical content filters already perform this exact 
adjustment automatically, ie giving more weight to tokens which correlate
well with the user's personal idea of spam say, and reducing the weight of
tokens which don't.

But if the user has to run a local filter in order to determine whether
or not to trust a remote filter, what is the point of having the
remote filter in the first place?

This is done by asking users to mark incorrectly 
filtered messages, i.e. without the need to understand the internals of the 
particular spam filter which marks all messages containg "mortgage" as spam.

If messages are in fact being filtered, i.e. if messages judged to be
"spam" are not sent to the user, then there is no way for the user to
handle false positives.  If there is in fact no filtering, then the user's
problem is simply compounded; he not only has to deal with the
transmission bandwidth and bulk of the spam, but also of the added
"processed" fields.
 
An assertion made by some
intermediate system which is supposed to be providing transport
services simply fails to fit into the Internet Architecture on many
levels:
a) is is not an end-to-end process
b) eavesdropping on private communications is immoral and
    offensive (and may be illegal in some places)
c) it is not under control of either party in the end-to-end
    communication
etc.

Would you say that current mail practices such as providing spam filtering
fit into the Internet Architecture described by RFC 1958 ? 

Filtering that takes place at the receiving endpoint is consistent
with the Internet Architecture.
 
For example, a) is broken by spam filters set at the level of SMTP
servers rather than filters set up at the individual MUA level.

Not if the server and its associated filter are under the recipient's
control.  And you are ignoring processing at the level of MDAs,
message stores, etc.

Also, 
b) is obviously broken by spam filters and virus checkers.

Only if performed without the knowledge and consent of the
recipient.

c) is 
broken by spam filtering services offered at the server level,

Sometimes (but not in cases where the server is under control of one
of the end parties).

and 
also by spam filtering offered by mailing list software.

In the specific case of mailing lists, the list mailbox *is* the
receiving end of the communication.
 
What can be done to make spam filtering Internet Architecture compliant?

Leave it up to recipients.

What "filter cooperation"?  In precisely what way do filters
at separate sites need to "cooperate", and how is that practical
without detailed configuration information?

As a means of reducing the cruft which is currently added by all these
filters.  A standardized syntax with known semantics (whatever it
turns out to be) stops people from reinventing their own X- headers again
and again for a standard service. 

You seem to be at least several months behind the times:
* years ago, things like X-RBL etc. were sometimes inserted by ISPs;
   some users complained about them (apparently believing that it
   involved eavesdropping, which was not the case), but they were
   largely useless because of DHCP and other types of shared IP
   addresses, shared IP address blocks, etc.
* About 6 months ago, my ISP tried to implement message mangling
   w/o user input or consent.  That quickly went away after I
   complained about a legitimate mailing list message which was
   modified by the ISP (among other things, "*Possible Junk Mail* "
   was prepended to the message Subject field).  That *was*
   eavesdropping; it was forgery as well.
* At about the same time, fields like
     X-Junkmail-Status: score=0/70, host=mr01.mrf.mail.rcn.net
   started being added; such fields have not been useful, and
   (fortunately) are no longer being added.
* A large number of MUAs have spam filtering capability
* There are a several MUA-independent filtering packages that can
   be installed and used by end users, even those with low levels of
   computer literacy

I'm just not seeing "the cruft which is currently added by all
these filters".  If you are, I suggest that you open a dialog
with the parties responsible for adding the cruft, with the
objective of getting them to stop doing so.  The problem of
that cruft which *at one time* did exist here, appears to have
gone away, and I don't want to encourage its return.  And I
certainly don't want to encourage eavesdropping on private
correspondence, much less alteration of such correspondence
in transit.