Vernon has regularly made the claim that a significant proportion of
spam messages have valid MAIL FROM's. That means that bounces will
go the the spammer. This has significant ramifications for C/R
systems (especially auto-respond ones) since it means that should
they have to, spammers could respond to challenges.
To test this theory, I took a day's worth of bounce logs from
somewhere.com (2003-05-15). These should be fairly normal logs.
There's been a bit of an upswing from a recent virus attack, but
otherwise these are pretty normal bounce logs for somewhere.com.
These are for addresses that do not, and have never, existed.
Because they got on the spammer's lists primarily because someone
entered the address on a web site, they get a mix of "true" spam and
just standard bulk mail. However if they bulkmailers are doing their
job, those addresses should be removed fairly quickly. If they
aren't removing on bounces--then they look and smell a lot like
spammers.
Known oddities in the data:
862 messages to wormalert(_at_)somewhere(_dot_)com and variations. These tend
to run about 1/3 viruses, 1/3 real messages and 1/3 spam. That set
has 533 distinct MAIL FROM addresses.
12340 messages from olga(_at_)somewhere(_dot_)com to
mail(_at_)somewhere(_dot_)com(_dot_)
(Misconfigured Axis video cameras.)
Since all I'm counting here are unique MAIL FROM addresses, neither
of these should have a huge impact.
I ran a program which took each MAIL FROM address, parsed out the
domain portion, looked up the MX record, and then connected to the
SMTP port of the lowest numbered MX server. I did a
HELO somewhere.com
MAIL FROM
<postmaster+AntiSpamAddressVerification(_at_)somewhere(_dot_)com>
RCPT TO <appropriate-address>
QUIT
Note that a few sites bounced me at the HELO prompt (didn't like that
I was on DSL, or that my name was somewhere.com). A few bounced at
the MAIL FROM (didn't like somewhere.com--and one claimed that +
wasn't a legal email character). But the number of either of those
was pretty low (less than half a dozen). I'll do a better job of
recording those separately in the future.
There were 39595 entries in the log, with 34404 distinct SMTP sessions.
There were 11559 unique MAIL FROM addresses.
+---------+-------+------------+
| errcode | total | percentage |
+---------+-------+------------+
| 0 | 99 | 0.86 | ???
| 250 | 5796 | 50.14 |
| 450 | 6 | 0.05 |
| 451 | 12 | 0.10 |
| 452 | 8 | 0.07 |
| 473 | 4 | 0.03 |
| 500 | 1 | 0.01 |
| 501 | 1 | 0.01 |
| 521 | 3 | 0.03 |
| 530 | 1 | 0.01 |
| 550 | 2341 | 20.25 |
| 551 | 3 | 0.03 |
| 552 | 2 | 0.02 |
| 553 | 288 | 2.49 |
| 554 | 48 | 0.42 |
| 555 | 1 | 0.01 |
| 556 | 1 | 0.01 |
| 571 | 1 | 0.01 |
| 1001 | 1880 | 16.26 | No MX Record
| 1003 | 1055 | 9.13 | No SMTP Server
| 1007 | 8 | 0.07 | Invalid Email Format
+---------+-------+------------+
In aggregate. 51% of the addresses were valid. 49% were not.
Of the ones that were not valid, 52% didn't have a reachable mail server.
Now let's see how it breaks down by domain.
Here are the top 5 domains in the MAIL FROM's.
+-------------------------+-------+
| host | count |
+-------------------------+-------+
| yahoo.com | 819 |
| hotmail.com | 714 |
| aol.com | 632 |
| earthlink.net | 209 |
| msn.com | 161 |
+-------------------------+-------+
Let's do the same stats for each of these. Note that I have a 1-2%
"No SMTP Server" rate. This could mean that they were rate limiting
my queries. More likely it's do the the very short timeout I put on
doing the query. I'll have to adjust that in the future.
+-----------+---------+-------+------------+
| host | errcode | total | percentage |
+-----------+---------+-------+------------+
| yahoo.com | NULL | 1 | 0.12 |
| yahoo.com | 250 | 669 | 81.68 |
| yahoo.com | 553 | 129 | 15.75 |
| yahoo.com | 1003 | 20 | 2.44 |
+-----------+---------+-------+------------+
+-------------+---------+-------+------------+
| host | errcode | total | percentage |
+-------------+---------+-------+------------+
| hotmail.com | NULL | 1 | 0.14 |
| hotmail.com | 250 | 111 | 15.55 |
| hotmail.com | 550 | 602 | 84.31 |
+-------------+---------+-------+------------+
+---------+---------+-------+------------+
| host | errcode | total | percentage |
+---------+---------+-------+------------+
| aol.com | 0 | 10 | 1.58 |
| aol.com | 250 | 581 | 91.93 |
| aol.com | 550 | 10 | 1.58 |
| aol.com | 1003 | 31 | 4.91 |
+---------+---------+-------+------------+
+---------------+---------+-------+------------+
| host | errcode | total | percentage |
+---------------+---------+-------+------------+
| earthlink.net | 250 | 43 | 20.57 |
| earthlink.net | 550 | 149 | 71.29 |
| earthlink.net | 554 | 14 | 6.70 |
| earthlink.net | 1003 | 3 | 1.44 |
+---------------+---------+-------+------------+
+---------+---------+-------+------------+
| host | errcode | total | percentage |
+---------+---------+-------+------------+
| msn.com | NULL | 1 | 0.62 |
| msn.com | 250 | 62 | 38.51 |
| msn.com | 550 | 97 | 60.25 |
| msn.com | 1003 | 1 | 0.62 |
+---------+---------+-------+------------+
Interesting that the results vary so much by ISP. Yahoo accounts are
pretty valid. Hotmail accounts are pretty bad. AOL is quite good.
Earthlink has a problem. MSN's slightly better, but still negative.
In general though, it appears that Vernon is correct. If my sample
is representative, a large percentage of spam is coming from real
email addresses.
I'll be making this data (and hopefully live update's to it)
available on the web, hopefully in the next few days.
As an addition anecdotal piece of information. In the past month
I've seen five separate email accounts (including two of mine) get
Joe-jobbed in a new way. Instead of major bounceback, they just get
one or two. It smells like new spam software that uses the same
database of addresses for From that they were using for To. The goal
might be to get through verification filters like the above. But
it's also interesting to consider what havoc that might wreak on C/R
systems. How is someone going to react with they get a challenge for
a message they didn't send? I predict that if people get used to C/R
systems they'll just click send--and the spammer's message will get
through.
Finally, as an addendum of sorts. Here are the unique messages
associated with the above error codes. I've left out 250 and 550
ones--I'm just tracking the less common ones. And they've been
normalized to remove email addresses and domain names.
+---------+-------+----------------------------------------------------+
| errcode | count | substring(message,1,50) |
+---------+-------+----------------------------------------------------+
| 0 | 99 | |
| 250 | 5796 | recipient ok |
| 450 | 3 | <EMAILADDRESS>: User unknown in local recipient ta |
| 450 | 2 | <localhost.localdomain>: Helo command rejected: Ho |
| 450 | 1 | Mailbox unavailable. |
| 451 | 1 | 4.0.0 Can't create transcript file ./xfh4GNNYv0581 |
| 451 | 1 | 4.3.0 error creating message, status = StatusSpool |
| 451 | 1 | 4.3.5 Error getting LDAP results in map sbcldap: |
| 451 | 4 | <EMAILADDRESS>: Temporary lookup failure |
| 451 | 1 | <LOCALPART> ... Recipient mailbox is full |
| 451 | 1 | Can't connect to bisman.com - psmtp |
| 451 | 2 | Requested action aborted: local error in processin |
| 451 | 1 | Server Error |
| 452 | 2 | 4.2.1 Mailbox temporarily disabled: EMAILADDRESS |
| 452 | 2 | 4.2.2 Mailbox full |
| 452 | 2 | 4.4.5 Insufficient disk space; try again later |
| 452 | 2 | Message for <EMAILADDRESS> would exceed mailbox qu |
| 473 | 4 | EMAILADDRESS relaying prohibited. You should authe |
| 500 | 1 | <EMAILADDRESS>: Recipient address rejected: Recipi |
| 501 | 1 | Syntax error in sender: <postmaster+AntiSpamAddres |
| 521 | 1 | This User has too many concurrents, please try aga |
| 521 | 2 | this mailbox is disabled or invalid (#5.2.1) |
| 530 | 1 | Delivery not allowed to non-local recipient, try a |
| 550 | 2341 | unknown user |
| 551 | 1 | 5.0.0 Mailbox disabled,storage space exceeded |
| 551 | 1 | EMAILADDRESS illegal name for an account |
| 551 | 1 | not our customer |
| 552 | 1 | <EMAILADDRESS>: Recipient address rejected: Sorry, |
| 552 | 1 | Requested action aborted: exceeded storage allocat |
| 553 | 1 | 5.0.0 <EMAILADDRESS>... No such user |
| 553 | 1 | 5.1.3 <EMAILADDRESS>... Invalid route address |
| 553 | 17 | 5.3.0 <EMAILADDRESS>... Addressee unknown, relay=[ |
| 553 | 1 | 5.3.0 <EMAILADDRESS>... Delivery ERROR!!!User does |
| 553 | 4 | 5.3.0 <EMAILADDRESS>... No such user |
| 553 | 6 | 5.3.0 <EMAILADDRESS>... No such user here |
| 553 | 2 | 5.3.0 <EMAILADDRESS>... That address is not curren |
| 553 | 1 | 5.3.0 <EMAILADDRESS>... Try
LOCALPART(_at_)symantec(_dot_)com |
| 553 | 1 | 5.3.0 <EMAILADDRESS>... User LOCALPART mailbox ful |
| 553 | 3 | 5.3.0 <EMAILADDRESS>... User unknown |
| 553 | 8 | 5.5.3 <EMAILADDRESS>... Invalid |
| 553 | 1 | <EMAILADDRESS>... User unknown |
| 553 | 2 | No mailbox here by that name, sorry (#5.7.1) |
| 553 | 1 | RCPT TO:<EMAILADDRESS> refused |
| 553 | 7 | Requested action not taken: mailbox name not allow |
| 553 | 143 | VS10-RT Possible forgery or deactivated due to abu |
| 553 | 88 | sorry, that domain isn't in my list of allowed rcp |
| 553 | 1 | sorry, your envelope sender is in my badmailfrom l |
| 554 | 1 | 5.0.0 ADMIN.COM ISN'T THE DOMAIN YOU'RE LOOKING FO |
| 554 | 3 | <EMAILADDRESS>: Recipient address rejected: Access |
| 554 | 1 | <EMAILADDRESS>: Recipient address rejected: Domain |
| 554 | 9 | <EMAILADDRESS>: Recipient address rejected: Not ac |
| 554 | 1 | <EMAILADDRESS>: Recipient address rejected: Relay |
| 554 | 5 | <EMAILADDRESS>: Relay access denied |
| 554 | 1 | <localhost.localdomain>: Helo command rejected: Ho |
| 554 | 1 | EMAILADDRESS Mail quota exceeded |
| 554 | 1 | Mail for EMAILADDRESS rejected for policy reasons. |
| 554 | 21 | Quota violation for EMAILADDRESS |
| 554 | 1 | Relay rejected for policy reasons. |
| 554 | 2 | SPAM-Relay detected |
| 554 | 1 | recipient <EMAILADDRESS>, Transaction failed |
| 555 | 1 | sorry, your envelope recipient is in my badrcptto |
| 556 | 1 | invalid email address EMAILADDRESS (5.5.6) |
| 571 | 1 | <www.somewhere.com[66.92.72.194]>: Client host rej |
| 1001 | 1880 | No MX Record |
| 1003 | 1055 | No SMTP Connection |
| 1007 | 8 | Bad Address Format |
+---------+-------+----------------------------------------------------+
--
Kee Hinckley
http://www.messagefire.com/ Junk-Free Email Filtering
http://commons.somewhere.com/buzz/ Writings on Technology and Society
I'm not sure which upsets me more: that people are so unwilling to accept
responsibility for their own actions, or that they are so eager to regulate
everyone else's.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg