spf-discuss
[Top] [All Lists]

Re: Re: HELO versus MAILFROM results

2005-05-07 08:35:39
Mark, I think we'd have a more fluent conversation if you would please
read the whole message before replying. I break ideas up in paragraphs
for readability, and it's not proper to reply to the individual
paragraphs without reading the others, as they may be somewhat out of
the context of the idea I am expressing.

Mark wrote:
If a spammer starts the mail conversation with "HELO sci.fi" he has
essentially thwarted your attempt to get a "FAIL" from the HELO.


But I also said: In your case, if software starts with a HELO check
against sci.fi, nothing is 'bypassed', as domain of sender sci.fi does not
designate mailers, so a regular SPF check against the MAIL FROM identity
is done after all.

Perhaps we're seeing the effects of differences in semantics. I consider
that if the step of checking the HELO string can be made to return
"none", then that step would have been useless in stopping communication
early.

This is analogous to a download page that has a "front", ie, a form
where you have to fill in your contact info before you are shown the URL
to the file you want to download. If you know the direct URL ahead of
time, you can jump there directly, without going through the form. I
consider this to be bypassing the (weak) protection that the site claims
to have. You might say that that is not "protection", but if the site
provides no direct links to the file, but provides links only to the
contact form, what is it called then? (Certainly it wants the contact
form to not be optional)

The remote spammer has full control over what SPF result you will
evaluate at HELO.

And the spammer also has full control over what SPF evaluates at MAIL
FROM. Your point being?


Do you not see why the two following conversations have different
usefulness to a spammer due to the subsequent, and unavoidable spam
filter? By usefulness I mean probability that the message will be
delivered to a user. I make the assumption that i.am.a.spammer.com has
an SPF record but no estabilished reputation in the reputation
databases. This helo could be any other helo without a reputation.

Conversation 1 (1%)
==============

HELO I.am.a.spammer.com
MAIL FROM: blah(_at_)random(_dot_)random(_dot_)random(_dot_)com
...
DATA: Improve your .... etc, etc.


Conversation 2 (10%)
==============

HELO I.am.a.spammer.com
MAIL FROM: blah(_at_)yahoo(_dot_)com
...
DATA: Improve your .... etc, etc.


Without SPF, the chances of delivery of these two emails is vastly
different.

With SPF, the changes of delivery of these two is still unchanged,
because neither random.com, neither yahoo.com publish an SPF record.

Similarily, the following two conversations also have different
usefulness to the spammer:

Conversation 3 (5%)
==============

HELO I.am.a.spammer.com
MAIL FROM: blah(_at_)novice(_dot_)spammer(_dot_)com
...
DATA: Improve your .... etc, etc.


Conversation 4 (0.1%)
==============

HELO I.am.a.spammer.com
MAIL FROM: blah(_at_)known(_dot_)spammer(_dot_)com
...
DATA: Improve your .... etc, etc.


(Say that both the novice spammer and the known spammer have SPF
records, but one has no reputation yet, while the other has a negative
reputation)

Without SPF, the chances of delivery of these two emails is vastly
different. I have shown in brackets the assumed chance of delivery for
each of the conversations.

With SPF, the changes of delivery of these two is still unchanged,
because both novide.spammer.com and known.spammer.com have SPF records,
and the respective SPF checks return "PASS".

You will undoubtedly point out that SpamAssassin will probably reject
the known.spammer.com based on its bad reputation, though it may
consider delvering the mail from novice.spammer.com, since there is no
reputation data for it. Thus, the probability of these two conversation
resulting in a delivered message is different (probably by a wide margin)

My point is that in all 4 cases, the HELO check would not contribute any
value to the rejection decision.

The rejection decision was made by SpamAssassin based on a calculation
involving the reputation of the MAIL-FROM domain and the contents of the
message. Even if the reputation of the HELO domain (ie, no reputation
found, perhaps 0.0) was an input to that calculation, it would not have
changed the outcome, since in all cases it would contribute 0.0 to the
calculation.

It is the probabilities of delivery that makes one domain less
attractive as a MAIL-FROM candidate than another domain.

This is why a random HELO that yields pass or none can be used, with no
effect on the probability of delivery, while a random MAIL-FROM makes
less business sense than a non-random MAIL-FROM.

Reputation at HELO
===================

Currently, the reputations of MTAs are tracked with RBL databases based
on their IP address. Is rejecting based on SPF not equivalent to
rejecting based on RBL? If not, what's the difference?

One difference that I see is that while an RBL can easily flag a
connection from a dial-up as 'bad', the SPF may use the HELO and arrive
to a 'none' or 'pass' verdict on the same connection.

What improvement does the HELO check offer over a simple RBL check?


That's what makes it useless to check.


No. You have to look at HELO checks in the larger scheme of things. The
introduction of a more pronounced HELO check at the beginning of the
process was primarily done to 'spice up' SPF performance with early-out
mechanisms -- as a man going on about excessive DNS queries, surely you
can appreciate that. :) So, you could do an SPF check against HELO,
without looking up the A record even, and treat an immediate 'fail' as a
quick early out.

Is the above idea equivalent to the "larger scheme of things" you had in
mind?


If you care to do the A record lookup, HELO checks can become very useful
in "karma" checks (against reputation services). Instead of whitelisting
ever-changing IP addresses, a properly resolving HELO name means you could
suffice with whitelisting just a handful of trusted 'key-words' (FQDNs, at
HELO). Until such time HELO checks are widely used for this purpose (it is
always hard to predict what the market will do), you can already use HELO
checks to get the early-out 'fail' result -- without even having to do the
A record lookup.

An RBL based on HELO names, instead of IP addresses?

It's trivial to have a properly resolving HELO name, compared to
connecting with dial-up and surviving an IP-based RBL check.

If the spammer wants you to see "NONE", he says "HELO sci.fi"

Perhaps you do not fully realize the fact, but SPF was actually designed
so spammers would use your 'bypass'. :) Seriously. The whole purpose of
SPF is so spammers will avoid using SPF-protected domains! That's not a
flaw; it's the whole point! We want spammers to say: "I am not going to
bother phising domain X any more, because of those pesky SPF checks; lets
use domain Y instead." In fact, if you can bring solid evidence that
spammers are already doing this, then we have cause to celebrate. :)

Actually the only thing that SPF can do is deny spammers use of a domain
name that has "good reputation".

There's no incentive for a spammer to use a domain name with "bad
reputation", when he could just publish an SPF record, and start from
"no reputation"

But unless the reputation of the HELO name is used by spam filters, it
doesn't matter if the HELO name has a good reputation, a bad reputation
or even if it has no SPF record.

And making spam filters make use of the HELO reputation is equivalent to
using an RBL on the incoming IP.

Oh, and you can't establish a reputation database on domains like sci.fi
until they publish an SPF record of some kind. The alteranative is that
spammers use that name, it earns a bad reputation, and then when sci.fi
eventually publishes SPF, they will be starting out with the bad
reputation that the spammers earned them.

So the only way to check the reputation of sci.fi currently is to use
the connecting IP address, and check it against the reputation database.
That is RBL, not SPF.


I.am.a.spammer.com   TXT "v=spf1 +all"

1. So what are you going to do? Block HELO's that resolve
with "PASS" ?

I could. If I did an A record lookup on the HELO, and it appears to be a
known spammer, I may well avail myself of this early-out and toss him out
of the nearest airlock. "pass" just means SPF says the relay is
authorized; I, on the other hand, decide who I want to receive mail from.

I think this is where your defense is starting to fall apart.

How can you tell spam arriving directly from a spamer apart from spam
arriving via a forwarded email account (forwarding that one of your
users signed up for) ?

It boils down to knowing if a helo name belongs to a forwarder or to a
spammer. How can you reliably assert this by using SPF, and by not using
an RBL?

2. Put the HELO strings in a reputation database? Recall that for each
DNS zone file, there are an infinity of possible HELO strings, each
unique. That makes for an infinitely large and infinitely useless HELO
reputation DB.

No reputable service would ever create an infinity of possible HELO
strings. Besides, it is entirely up to me how many TLD (sub)levels I deem
relevant, and in what order I will check them. Say, I check
"mx01-dom.earthlink.net" as HELO name, and it resolves properly, then I
would probably use 2 DNS queries to a reputation service: one for
"mx01-dom.earthlink.net" and one for "earthlink.net". And if it really
became the far-fetched case that spammers would create an infinity of
possible HELO strings, I would simply reverse the order, and evaluate
"earthlink.net" first. Problem solved (if there even was one).

So eventually we'll be back to checking *only* the MAIL FROM. Thank you
for agreeing with me.

Unfortunately, if it headed that way, of the useless HELO check, a lot
of DNS bandwidth would have been wasted globally to learn that lesson.

Seeing how spam usually follows the path of least effort and least
resistance, I think the infinity of HELO strings is that path, and thus
it will be followed. But it is somewhat subjective to decide which is
the path of least resistance.

Perhaps the better question is: How will the spammers adapt if the
entire world used SPF? (Hint: I don't think they'd stop trying ;)

Unfortunately there is no required relationship between the
domain name in MAIL FROM and the name in HELO.


Make that fortunately. :) I host many virtual domains, and they are all
rigged to use my SMTP server. And my mail server only uses one single HELO
name ("mail.asarian-host.net"), identical to the PTR. And I like to keep
it that way. :)

That's how it's supposed to be :) I am in the same situation, though my
PTR does not resolve to the HELO name, because all my services run on
the same IP address, and I can only put one name in that PTR record, and
I chose another service to have the honour of a PTR record pointing to it.

Obviously, the requiremenet of having a mail server dedicated to each
domain name registered does not scale, as it would be equivalent to
aking that there be a post office dedicated for each household.


At my site, the name "yahoo.com" gets a spam rating of -2, because I
have some friends that write me from there. I get a lot of spam that
forges yahoo.com, but -2 is the average that my tools automatically
found to pass most ham through and reject most spam.


I 'fail' to see what this has got to do with anything. You are talking
about post-processing, long after the SMTP session has closed. Whatever SA
score you assign to the mail, later on, has no bearing on the
determination, inside the SMTP dialogue, on whether the name "yahoo.com"
was used in an unauthorized fashion (SPF-wise).

It has got to do with the usefulness of a random domain name in
spammer's MAIL-FROM, and the fact that it's not the same as random thing
in the HELO, due to how reputations of the two entities make a difference.

The HELO is a completely different matter, as spam filters do
not care, or assign any type of reputation information to HELO names,
which means a reputable name is as good as the sleaziest of names (like
I.am.a.spammer.com).


You can assign as much value to it as you wish. I mean, if a spammer
announces himself, in HELO, as "i.am.a.spammer.com", and it resolves
properly, then you are free to ignore what you want, and wait for SA to
examine the content of the mail. I, on the other hand, would go for the
early-out.


Spam filters do assign some value to the IP of the connection, using
RBLs, which is much better to assigning value to the HELO.

For domains that protect their HELO with SPF, the value of the HELO
check is equal to the value of the RBL check.

For domains that do not protect their HELO with SPF, the value of the
RBL check is significantly higher than the value of the SPF check on the
HELO.

The only check that might be remotely valid is to check the A
record to ensure it matches the IP address.

Which would not be 'remotely valid', but 100% safe (barring
DNS hacks, of course).

Not exactly 100% safe as long as the IP address notation is
legal, since it can only be compared against the connecting IP address,
but cannot be looked up.


You missed the point: the bracketed IP literal is not an SPF-protectable
domain name that needs looking up even.

I don't think I missed the point.

I claimed that in case IP notation is used, it is not 100% safe to rely
on an A query, because there's no A name to query.

How do you prove that "no name" == [IP notation] ? What do you query?
How is this 100% safe ?

It's like this:

To have an 100% safe A query on the HELO, you need an answer from a DNS
server. To get an answer, you need to ask a question. I don't think in
the case of "HELO [IP literal]" there is a question. If I missed
anything, it's that... how do you make an A query from an "HELO [IP
literal]".

Would the HELO SPF check not be bypassed if the spammer says "HELO
[12.34.56.67]" ?


If by 'bypassed' you mean the spammer no longer uses an SPF-protected
domain name, then yes. :)


And would the MAIL-FROM SPF check not be also bypassed if the spammer
said "MAIL FROM <>" ?


No, because a check would be done against postmaster(_at_)HELO(_dot_)

An SPF check on postmaster(_at_)[12(_dot_)34(_dot_)56(_dot_)78] ? What would 
that mean ?


But I diverge slightly. Even though SPF cannot be used in this case,
using <> in this way will not help the spammers much, because the
postmaster itself will earn a bad reputation score, and we're back to
what I said above about the reputation difference between a familiar
domain name and an unfamiliar domain.


Since SPF is targetted at protecting domain names (the RHS of an email
address), I do not see how the reputation of the local part (LHS) plays
into this. SPF examines the legitimacy of the relay in question; a check
against postmaster(_at_)HELO will serve to make that determination; no more, 
no
less.

I said nothing about local part.

I said that forging the postmaster account by sending MAIL FROM <> is
marginally useful, because many spam filters will know that that route
is prone to spamming, so they will scrutinize it.

It is much more useful for a spammer to forge MAIL FROM
<blah(_at_)domain(_dot_)com>, especially if the spammer controls the domain.com 
and
publishes SPF for it, because at least the spam filters will not know
initially that that domain is used for spamming.

Not knowing that a domain is spammer gives a better probability of
delivery than knowing that the <> trick is most often used for spam.

Regards,
Radu.