Re: [Asrg] Introduction and another idea

From: Benjamin Geer <ben(_at_)socialtools(_dot_)net>

Is it high because those people use bold, italic, and so forth or
because their MUAs send in HTML by default and they are not savvy
enough to fix that problem?


I'm not sure, but my impression is that there's a fair amount of both.


Why don't you test that impression?   This is a question that can be
measure, and I think makes sense.  How much of that mail involves real
formatting?  If you choose 50 messsages at random, how many use HTML
and of those, how many use any real text formating?  (e.g. intentional
use of <em> or <b> as opposed to noise added by MUAs to all HTML
messages they create)

If you'll take bets, put my money on 80% or more using HTML and
less than 1% using any real text formating.

...
Fair enough.  But even if all MUAs turned off HTML by default, spammers
would continue to use HTML.


You say that as if it were a bad thing.  On the contrary, if spammers
label their stuff as spam, it becomes harmless to all practical purposes
(screams of rage from spammer fighters not withstanding).

Getting back to Gordon Peterson's proposal, he suggests a mechanism which
would discourage the use of HTML and attachments, by forcing people to
explicitly give others permission to send them those types of messages.
This is meant to reduce the effectiveness of spam and address-harvesting
techniques, and increase the effectiveness of content filters.


Yes.

Unfortunately, I think the vast majority of computer users have absolutely
no idea what HTML is.  As a sysadmin, I've tried to explain it to people,
and their eyes glaze over. ....


Yes, but that cuts both ways.  I bet most users have no clue
how to tell IE or Netscape how to make a word bold or red.

As I test, I just now tried to get Netscape 7.02 on XP to make a word
bold in a mail message.  I cannot find any sign of any way to do such
a thing.  I assume there is a way, but that I can't find it suggests
that your Sally End User might not use it very often.  (My first blush
with computerized text formating predated my first encounter with the
Internet on the console of TIP-25 (DOCB) in 1972.  I also wrote one
of NBI's text processing systems, so I flatter myself in thinking I'm
not entirely adverse to fancy text.)

                       ...     Once they understand the severe limitations
of this wonderful 'plain text' format which I'm encouraging them to use, the
first thing they want to know is how to make sure that they never use it.


So turn off HTML until the the user makes some text bold or purple,
and then silently and automatically switch to HTML.

They *like* the fact that when you reply to a message, Outlook puts the
quoted text in a different colour; they feel that it makes messages easier
to read.  To them, right angle-brackets look like something out of the Stone
Age.


You are confusing presentation with encoding.  There are MUAs that
recognize various > markings and colorize quoted text.  I think they're
garish and distracting, but there's no disputing tastes.

...
I suspect that most people, if they understood that restrictive permissions
would cause legitimate messages to bounce, would opt for giving full
permissions to all senders.


That is contradicts experience reported by people running retail ISPs.
Many ISPs report almost all of their customers are happy with single
digit false positive rates for 80% reductions in spam loads.  I think
80% is a modest improvement and that 100 times smaller false positives
rates of 0.1% are at the high end of the tolerable range, but I'm
probably not typical of retail ISP customers.

Notice also that most proposed or existing spam defenses that people
advocate have single digit false positive rates.   

As I keep saying there are those of us whose livelihoods involve
email and who won't tolerate 0.2% false positives and there are
the 100,000,000's of the rest of the Internet for whom not receiving
spam is more important than not hearing from a long lost friend.

As for making content filtering more effective, it would be easy for content
filters to strip out all the HTML tags, comments, JavaScript, etc. from a
message before filtering.  No need to actually parse the HTML.  (I'd be
surprised if there weren't content filters that do this already.)


That's wrong in two ways.  What I consider reasonable filters (e.g.
the DCC) already strip HTML tags and so forth, but that turns out to
be insufficient.  For reasons you might guess if you knew that the
DCC is my hobby, I'd rather not say more to strangers.

Finally, even if this proposal worked exactly as planned, why couldn't
spammers do just fine with ASCII?  The Nigerian 419 spammers and their
imitators have been using ASCII all along, and if they were a bit more
clever about their wording, it would be quite difficult to filter.


That's half right and half wrong.  Spammers need to appear as legitimate
as possible to their targets and to law enforcement.  Without HTML
tricks to hide various things, all of their stuff is in the open and
plainly visible to your Sally End Users.  Even your Sally probably
discards a Nigerian 419 spam after reading the first 5 words and she
may delete unread mail with obvious "hash busters" in subjects.

Again, I think this is all really beside the point.  We're here to come up
with a mechanism to stop unsolicited bulk email.  Gordon's proposal is
concerned with some superficial properties that much spam currently has, but
doesn't deal with the core issues: 'unsolicited' and 'bulk'.


On the contrary, "solicited" is related to "consent," and Gordon's proposal
is all about "consent."

As for bulk, I know of exactly one idea with any hope of detecting that
aspect of incoming mail, and it is certainly not perfect, 100% accurate,
or invulnerable to spammer tricks.  That idea is that as a message
arrives or is about to be presented to you, your MTA or MUA should ask
a clearinghouse if substantially identical messages have been seen
elsewhere.  If the answer is "yes, 12345 of them," then your MTA or MUA
can know it is a "bulk" message check your list of approved senders or
whitelist.  For scaling, the clearinghouse should be distributed.  There
are other aspects involving scaling, privacy, security against various
attacks that need to be addressed.  When you've done that, you'll have
a direct competitor to the Distributed Checksum Clearinghouses or the DCC.
See http://www.dcc-servers.net/dcc/graphs/index.cgi

Again, I realize that the DCC is far from perfect.  That is one rason
why I've talked about it far less than other things here, and less
than other people have talked about their favorite mechanisms.


Vernon Schryver    vjs(_at_)rhyolite(_dot_)com

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg