[Asrg] Discouraging HTML-burdened E-mail

What I find bizzare is the persistent equating of turning off HTML by

default with banning HTML.

Fair enough.  But even if all MUAs turned off HTML by default, spammers

would continue to use HTML.

In that case, it makes it an EVEN MORE reliable indicator of mail that most 
people probably don't want (especially unsolicited).  :-)

Getting back to Gordon Peterson's proposal, he suggests a mechanism which

would discourage the use of HTML and attachments, by forcing people to
explicitly give others permission to send them those types of messages.

Exactly.  (IF they wanted to allow those senders to send them that type of 
stuff. which they might not).

This is meant to reduce the effectiveness of spam and address-harvesting

techniques, and increase the effectiveness of content filters.

At least it would mean that address harvesters would not get NEARLY so many 
E-mails through, as long as spammers continued to employ all their familiar 
HTML-based tricks, deceptions, and obscuring techniques.

Unfortunately, I think the vast majority of computer users have absolutely

no idea what HTML is.  

All the more reason they shouldn't triple to quintuple the carrying cost of 
their E-mail due to using it inadvertently.

As a sysadmin, I've tried to explain it to people,

and their eyes glaze over. They also have trouble grasping the notion of
'plain text', i.e. a format that doesn't support all the things they
normally think of as being available in 'text' (which for them means what's
available in Microsoft Word).  Once they understand the severe limitations
of this wonderful 'plain text' format which I'm encouraging them to use, the
first thing they want to know is how to make sure that they never use it.
They *like* the fact that when you reply to a message, Outlook puts the
quoted text in a different colour; they feel that it makes messages easier
to read.  To them, right angle-brackets look like something out of the Stone
Age.

Again, if they WANT to enable HTML-burdened mail, for specified correspondents, 
they CAN.  And easily, I'd hope.  

Frankly, I think that if people observed a DIRECT correlation between 
re-enabling HTML and a 10x increase in the amount of spam in their inboxes, I'd 
bet that most of them would figure out that it just wasn't worth it.

Imagine Sally User, who is a customer service representative, or a

freelancer.  Her ISP starts offering the proposed permissions service.  She
has no idea what HTML is, but she gathers that it's something about reducing
spam, so she accepts the default permissions, which are quite restrictive.
Suddenly people who write to Sally find that their messages bounce, with an
automated reply saying: 'Please send me a plain-text message asking for
permission to send me HTML email or attachments.' 

Actually, a better bounce message would say something like:

  "Your message is returned to you because you have either sent attachments or 
bulky and annoying HTML-burdened formatting.  As a responsible way to help 
reduce the volume of spams, viruses, worms and trojans, I will gladly accept 
unsolicited E-mails if they are not in HTML-burdened format and if they don't 
contain unexpected attachments.  If you NEED legitimately to send me 
HTML-burdened mail or attachments, you'll need to clear that with me in 
advance....

  "Thank you for your understanding, and your help in us all working together 
in 
reducing the scourge of spam and viruses from the Internet.

  "[If you don't understand what HTML means in your E-mail or how to turn it 
off 
in your outgoing E-mail, contact your ISP for more information, or else see the 
web site at http://www.stopspam.org/noHTML.html ] "

These correspondents have no idea what this means (remember, they don't know

what HTML is, either), 

The bounceback message could refer them to a Web site which would explain the 
problem, and explain how to turn off HTML formatting for the most popular 
E-mail 
clients.

and are put off by the fact that their email was rejected.


I think you need to explain in the bounceback WHY it was rejected, and WHY the 
rejection is in ALL of our MUTUAL interest to do so.

...So they simply take their business somewhere else, to another company or

freelancer who doesn't reject their email.  

I'm sure that the spammers lurking here on this list are desperately hoping 
that 
everyone will cling tenaciously to the "need" to receive HTML-burdened E-mail 
and attachments from any-and-all, including untrusted/unknown, senders.  The 
*great* majority of spam being sent out COUNTS on that!

...When Sally finds out, she's horrified, and quickly sets her permissions to

accept anything from anyone.

IF she does that (and doing so ought to produce a CLEAR warning that it will 
hugely increase the amount of spam she receives) then AT LEAST the spam monkey 
is off the back of Sally's ISP, and she can't legitimately bitch about getting 
a 
lot of spam... since SHE made the CONSCIOUS CHOICE to defeat the anti-spam, 
anti-worm, anti-virus provisions offered.

I suspect that most people, if they understood that restrictive permissions

would cause legitimate messages to bounce, would opt for giving full
permissions to all senders.

A *vanishingly* small percentage of "legitimate" messages from 
unknown/untrusted 
senders NEED to have HTML, attachments, or encoded text bodies.

Again, you might have a point, IF the filter still let unchanged quantities of 
spam to arrive in Sally Stupid's inbox.  But if she saw her spam decrease 
HUGELY 
when the ISP turned on the feature, and then saw it come back with full 
vengeance after she defeated the feature... I suspect that she'd pretty quickly 
learn to appreciate the value in it... and decide that it was worth her trouble 
to understand it.

As for making content filtering more effective, it would be easy for content

filters to strip out all the HTML tags, comments, JavaScript, etc. from a
message before filtering.  

Indeed, although some of what they WANT TO SEE is in the HTML and scripting... 
for example, a script that creates a "known spammer domain" URL dynamically.  
Another case are a large number of andomly generated, bogus HTML tags (and 
often 
in the middle of words!) which are only there to obscure content.... a 
near-certain indicator that the E-mail in question is spam, rather then 
legitimate.  You don't really WANT to eliminate them, if they're there... you 
can learn a lot by taking a closer look at 'em.

No need to actually parse the HTML.  (I'd be surprised if there weren't

content filters that do this already.)

Might be, but it's slower and longer (especially if you're going to try to 
emulate scripts etc to see what they 'might' produce... and that is a slippery 
slope indeed.)  My present filter takes a pretty good look at the HTML, and 
strips most (but not all!) of it from the incoming messages.  It generally 
strips the stuff that doesn't reveal much about the nature of the spam (or 
legitimacy) of the E-mail.

Finally, even if this proposal worked exactly as planned, why couldn't

spammers do just fine with ASCII?  

They could, but:

   1)  it's harder to obscure the content

   2)  most URL tricks become unusuable

   3)  ActiveX and scripting become unusable

   4)  Obscured URLs become more obvious and suspicious when copied-and-pasted 
manually
  
   5)  font size, color, and related tricks become unusble

   6)  message-in-image-format becomes unusable

   7)  all Web bugs are suddenly unavailable

   8)  popups and the like are unavailable

   9)  they can't steal real-looking graphics from Ebay, PayPal, or other 
"legitimate" Web sites

  10)  people are more inclined to think twice before copying-and-pasting a URL 
compared to just clicking on it

  11)  clutter-obscuring the message with HTML comments, bogus nonsensical HTML 
tags, and the like become unusable

The Nigerian 419 spammers and their imitators have been using ASCII all 
along,

and if they were a bit more clever about their wording, it would be quite 
difficult to filter.

Indeed.  But PRESENTLY a lot of the wording (if plain ASCII) is fairly easy to 
filter, and that would probably be even more true if spammers were practically 
speaking constrained to plain ASCII text.  Spammers defeat that kind of thing 
by 
sending (for example) an E-mail which contains an Image tag to a block of text 
stored as a GIF or JPG file, which content scanners can't do much with.

Again, I think this is all really beside the point.  We're here to come up

with a mechanism to stop unsolicited bulk email.  Gordon's proposal is
concerned with some superficial properties that much spam currently has, 

I think that you're casually dismissing the fact that HTML and attachments are 
both KEY to a lot of the deceptions and misrepresentation that spammers (as 
well 
as virus authors) count upon.  Even if we only still got the same number of 
spam 
messages as at present, but in practice took out the alternative HTML-burdened 
content, we'd already reduce overall spam volume (in terms of bytes) by 
probably 
75-90%.  *I* certainly think that would be worthwhile!

but doesn't deal with the core issues: 'unsolicited' and 'bulk'.


I disagree.  The permission list would allow for a fairly strict default 
behavior, which would block the majority of unsolicited spams.

As for "bulk", I don't think that's automatically a behavior of spammers... 
indeed, more and more spam I've seen is individually created for each user (for 
example, having the recipient's name or e-mail address or at least their dowmin 
name somewhere in the spam).

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support the Anti-SPAM Amendment!  Join at http://www.cauce.org/
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg