Re: covert channel and noise -- was Re: proposal ...

On Tue, 17 Feb 2004, Vernon Schryver wrote:

   (Silently discarding _is_ a bad idea, when done by the SMTP server
itself. IMHO, it's better to mark for later discard -- which actually
could be done in such a way as to mark only for those recipients who
requested the more restrictive filtering.)


Or, mark for later accept/reject decisioning AFTER the SMTP server per
se, in the filter pipeline between the server and the mail spool of the
addressee.  Spam assassin does the right thing already (and this is
exactly what it does).


A better positition is that everything should be logged, particularly
including discarded mail, and in that case, enough of bodies to allow
targets to identify senders and the nature of the discarded messages.
Of course, one should assume users won't normally look at those logs.
Spam you read is not filtered, but at most categorized and stigmatized.


Logging a message you reject is nearly a waste of time.  In order to
recover the message (as you note, nobody ever looks at the logs, which
are VERY LARGE for a busy mailer and beyond human capacity to scan), it
really requires an out-of-band message telling you there is a message to
be recovered.  As in calling somebody up and asking them why you haven't
paid your e-bills (rejected as spam).  In most cases it will require a
retransmission and out-of-band communications to complete the
transaction, because what you have done is made email an unreliable
messaging service for valuable, wanted, messages.

This is where, and why, I take issue with filtering and discarding at
the level of the SMTP server, unless the accept/reject decision can be
made with 100% precision (no false positives, no false negatives, and it
may not be good even then because MY idea of the correct basis for the
decision may not be the same as YOURS).

Imagine a networking transport protocol such as TCP, that discarded
packets not based on their header information (according to any
protocol-level criterion you like) but on their CONTENT, using a
moderately arcane ruleset designed to identify high-level human patterns
such as spam from the content information, applied to the content in a
complex and highly multivariate decision tree.  Hmmm, pretty ugly for a
TRANSPORT LAYER.

It's not that filtering based on non-header-linked aspects of content is
or isn't a good idea in some cases.  It is that it has no business being
in the specification of TCP.  TCP is about reliable delivery and
sequencing of packets.  It (as a protocol) has nothing to do with the
content of those packets outside of information contained in its
headers.  Indeed, rejecting packets based on any mechanical ruleset
applied to packet contents will almost certainly REDUCE the reliability
of the network, because (for example) an encrypted document might by
pure chance have a byte sequence like SEX that caused it to be rejected
even though its actual contents were quotes from the stock market.  One
doesn't write filters looking for buffer overwrite attacks into the TCP
stack in the kernel -- one fixes the application.

For nearly all filtering programs, it is too easy to create a message
that is filtered but shouldn't be.  Even rejecting viruses on the basis
of their signatures makes it difficult to SEND a virus you've just
received to an email drop where somebody can identify it.  As we've
seen, a naive application of rules mkes it impossible for this
most-certainly-not-spam discussion to penetrate various "protected"
sites.  It is often similarly easy to craft a message that should be
filtered but isn't because it slips through your rules.  But we've had
this discussion; the point is about reliability.

SMTP was designed to permit reasonably RELIABLE (simple) transport of
addressed mail on top of a TCP connection.  In most cases, ethernet
forms one of the lowest level protocols that the mail message is wrapped
in (ethernet header), then IP (IP header), then TCP (with its header).
SMTP pays relatively little attention to the message headers at the
transport and network protocol levels as it doesn't matter much which
ethernet address was responsible for the last packet hop (likely a
router/gateway and the same for all packets and messages).  It doesn't
care MUCH about IP or TCP, except that the packets have to be correctly
wrapped according to the protocols and addressed to SMTP listening on
the appropriate port.

SMTP does care about packet content, but only to a certain level.  Part
of the packet content it cares about is the negotiation phase where it
manages a connection, executing "commands" based on content in a very
structured way designed to communicate key information recursively along
the mail message's delivery trajectory, as it builds its own mail header
containing the addressee, the nominal sender, and the delivery route
complete with timestamps.  It is designed to be quite trusting, hence
easy to spoof, at least for mail to someone that doesn't read the rest
of the header.  Part of the packet content it views as "data" -- a
message to be reliably delivered to a uniquely specified spool file,
encapsulated within its MAIL header information (the TCP, IP and
ethernet level header information having been discarded).  There a user,
or a program, may or may not read it and take actions based on its
content.

It seems to me to be highly unacceptable to attempt to insert
content-based accept/reject decisioning in at this PROTOCOL level in the
delivery process.  It also seems to be highly risky and possibly legally
actionable.  I expect, in good faith, that email addressed to me will be
delivered if it is deliverable.  Otherwise I cannot rely on email as a
reliable transport mechanism for important messages.  Filtering it "for"
me according to ANY CONTENT-BASED RULESET risks discarding at least some
messages that are not correctly classified when they are rejected.
Important messages can be lost.  Bad things can result.  Who is
responsible when this occurs?  Who do I get to sue?

Imagine the post office (the real one) opening your mail and examining
content -- for most of us this alone would be a nightmare and invasion
of privacy -- so fine, with automated anonymous "safe" machines, to
eliminate advertisement mailings (only) but pass everything else.  One
day it rejects and shreds a sweepstakes check (a real one) thinking it
is one of the many fake checks sent out by loan companies.  Another day
it shreds a warning by your bank that a loan payment is due because it
also looks like an advertisement.  Even humans make this sort of
(sometimes very expensive) mistake, but at least if you make it it is
"your fault".  How would you feel, and react, if the mistake were
utterly beyond your control?  How long would it be before banks and
other businesses rejected the post office as a reliable transport
agency?

It is perfectly reasonable for you to add content filters that YOU
control ABOVE this transport layer.  If you want to hire a secretary to
open all of your mail for you and sort it and reject all the
advertisements, you can.  If the secretary makes a mistake and throws
away a megamillion dollar contract offer that you subsequently lose, you
also bear the responsibility, or at least can direct your anger at
something you control and take steps to ensure that similar things don't
happen again.

Now, all that it would take to control this end stage filtering and make
it much more reliable would be a federal law mandating that advertising
communications be sent in envelopes clearly labelled as such.  No more
sending out loan offers in envelopes that strangely resemble official
government communications unless they were clearly labelled
"advertisement".  No more writing the address in by hand to convince you
that you are opening personal mail.  If somebody violated the law, you
would have the envelope and offer in hand and could recover damages in
small claims court with very little effort.  Otherwise, you could safely
discard all advertising.

Note that this doesn't really help someone forced to do the final
accept/reject step themselves by hand.  They still have to look at each
envelope to look for the mandatory classification mark.  Advertisers
would of course try to exploit this by making the envelopes themselves a
major part of their messages -- "open ME".  It would make it fairly
simple to institute automated blocks, though, both for human secretaries
and real mail and for e.g. procmail and email.

With that all said, there are tools that ALREADY provide the kind of
content level filtering mentioned above.  The better ones do not
themselves discard or bounce any mail to any user.  They simply SCORE
THE CONTENT with regard to its likelihood of being spam (or a virus) on
the basis of a whole battery of tests.  Scores that exceed a given
threshold can easily and automatically be rejected or binned for a
second stage pass by humans later looking for lost checks and bills.
The USER (the one that ultimately knows the value TO THEM of a lost
message) can set the threshold to whatever level they are comfortable
with.  Some will prefer to play it loose and never risk losing a
message, even though it means that a lot of spam gets through.  Others
will reject on a fairly low threshold, not caring that they blindly miss
out on discussions of spam on ietf.org because those discussions
inevitably contain phrases like "Buy Viagra From Us Today" (oops, lost
one whole block of recipients), or "Have the Wildest Sex Ever" (now I
lost all the elementary and high schools in the country).  It is THEIR
DECISION, and they can pay the expectation cost "penalty" in lost
messages vs lots of spam at whatever level they select.

This doesn't require protocol level modifications of SMTP.  In fact, it
is generally desirable not to alter the SMTP MTA except in VERY
carefully thought out ways, although it is fine to graft in content
rating/filtering systems after the reliable delivery is complete to the
receiving host but before putting the message in a user's mailbox (with
the score in a special header where it can be used to drive user-level
decisions.  Just as spam assassin actually does now.

I repeat, I see little for the IETF to do about spam at the protocol
level, although it could be a powerful force at the political level,
communicating to our lawmakers the real costs of spam and urging them to
adopt stringent legal controls and penalties.  We could help draft those
legal controls and penalties.  For example, make it illegal to collect
and resell lists of email addresses for commercial purpose.  Require all
advertisements to be clearly labelled as advertisements in a form that
permits them to be automatically and reliably filtered.  One thing that
we COULD do is create a new mail header line marking mass produced
advertising mail as such, and match that up with laws requiring all
legitimate corporate clients to use it (with hefty penalties for those
that don't).

This is not the case for bidirectional encryption of email content.
There it won't happen unless and until the IETF works out a practical
way to make it work at the protocol level, since clearly ALL MTA's have
to be able to manage the encryption.

At this point I see no practical way to require or enforce point to
point encryption of all mail at the user level without a nearly complete
reengineering of mail transport (basically replacing smtp altogether).

I CAN imagine all hosts having their own RSA public and private keys,
and having all the public keys be part of their domain nameservice
registration information and hence automatically available to
"everybody".  I can imagine both tools and service providers that could
generate suitable key pairs and manage distribution of the pairs onto
registered systems either for a fee or as a part of routine systems
management (as we already do now, for the most part, for ssh).

I simply canNOT imagine that process extending down to the level of
private users at this time.  I don't believe that there is yet a
suitable vehicle for the required nameservice that would scale to
billions of registered entities and many trillions of bytes of served
data (as good keys aren't small) and that has at least the robustness of
the existing DNS.  

It is worth imagining such a service, but we are not terribly close, I
think, to being able to engineer it at this point.  It also has
interesting social costs and risks that I suspect would significantly
affect the engineering, as obviously protecting one's private key then
becomes an "interesting" problem all by itself.  A well managed computer
is already a bit of a fortress with facilities that permit protecting
data.  Users are not fortresses at all (quite the opposite) and if
anything are terribly lax with important data.

   rgb

-- 
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     
email:rgb(_at_)phy(_dot_)duke(_dot_)edu