Based on the discussion on this list over the last couple of weeks, I want
propose a modified algorithm for selecting and validating email identities. I
want to explain the change and the rationale, and get your feedback. This is a
lengthy post; please bear with me.
There are a couple of key issues that various members of this list have brought
up that I think need to be addressed. Forgive the lack of attributions here -
you know who you are! :-)
- even though I've babbled on ad nauseum about user experience and validating
something the end user can see, there are cases in Caller Id where the From
line never gets checked
- it's been brought to our attention that there are a small but significant
number of mail list servers that do not insert a Sender header. It's not clear
how many sites use this, but it's another hurdle as mail from these list
servers would appear to be spoofed. All of them however do put the list owner
address in the RFC2821 MAIL FROM.
- Caller Id proposes the use of Resent-* headers by forwarders. However, there
are some folks who have already implemented SRS or VERP, and others who believe
that forwarders should indeed handle bounce messages. While I have very deep
concerns about SRS and VERP, if some organizations want to adopt either one,
that should be their choice.
To address these issue, I'd like to propose a revised algorithm for selecting
and validating identities. It's a two-step test.
1. Always first perform the spoof check on the RFC2822.From domain (i.e. look
up the TXT records for this domain and verify the connecting IP address is on
the list found there). If this passes, we're done. (If the RFC2822 From isn't
a valid email address to begin with then we should probably reject the mail.)
This is the normal case for most legitimate mail which travels one hop from
source domain to destination domain. Of course it still could be a spammer
with a throwaway domain, but that'll get block listed pretty quickly by other
means.
At the same time we are looking up the list of authorized outbound MTA IP
addresses for the domain, we can also look up other policy statements the
domain may choose to make, for example
- directOnly: this domain does not knowingly send mail via mailing list or
forwarding services, further stringent checks of the sender may be required
- alwaysSigned: this domain always digitally signs outbound email
- accreditedBy: this domain's email behavior & polices are accredited by one
or more named services
2. If #1 fails then it tells us there's a high likelihood the mail has
traveled >1 hop. In that case, select as the purported responsible domain
(PRD) the first non-empty identity from this list: Resent-Sender, Resent-From,
Sender, RFC2821.MAIL FROM. Perform the spoof check. If it passes we're good,
otherwise, it's spoofed, or the domain hasn't published. If the directOnly
policy designation was made for the RFC2822 From domain in step #1, ensure the
PRD is on the recipients' trusted senders list.
Here too, we can also take note of policy declarations the PRD domain may have
made. One additional possibility relevant here is
- noVisitors: this domain does not relay mail on behalf of any other domain
BTW, it's come to our attention that some MTAs today insert additional headers
in the message whenever they forward mail. Postfix and qmail apparently insert
a header called Delivered-To and exim inserts an Envelope-To header. Although
these headers are not defined in RFC2822, they seem to be in fairly widespread
use. We could consider adding them to the list of headers in step 2, after
Resent-From and before Sender. If we did this, then a number of well-known
forwarders, including pobox.com I believe, would be compliant today, without
the need to add Resent-* headers or implement SRS.
That completes the description of the modified algorithm.
Advantages
- we always start by basing validation on a header the end user sees. All
other situations are special cases. In fact, you can think of all the special
cases in the current Caller ID spec as cases where the From domain is not the
PRD. This new algorithm cleanly separates these out into a second test.
- list servers that don't insert Sender today but still have the list owner's
address in the RFC2821 MAIL FROM are compliant without changes
- Other checks of the RFC2821 MAIL FROM can still be performed (e.g. allow/deny
lists)
- Organizations that wish to implement SRS or VERP may do so
Disadvantages
- this may force us to do a 2nd spoof check in more cases than in the previous
algorithm, but I think we can optimize this a little. In Caller ID we're
already having to do a 2nd lookup of the From domain's TXT records to check for
the directOnly setting anyway
- a little more complexity in the algorithm.
Answers to Some Anticipated Questions
Q1: Why not check RFC2821 MAIL FROM in #1 and leave the RFC2822 headers to step
2?
A: Because MAIL FROM by itself doesn't give us actionable information in most
cases. If the spoof check passes, it could still be a spammer who's registered
their own throwaway domain but forged the RFC2822 From line. We can't just
accept the message without further spoof checks or we risk misleading the end
user into thinking the From line has been validated. If the spoof check of
MAIL FROM fails, it could just be because a legitimate message has come through
a forwarding service that hasn't implemented SRS. In other words, no matter
what result we get from checking the MAIL FROM, further testing of the RFC2822
headers will *always* be required.
Q2: Why not put RFC2821 MAIL FROM as the first identity in step 2?
A: Because that would *force* adoption of SRS. MAIL FROM is only empty in
cases of bounce messages so we would only ever fall through and check the other
headers on bounces. As I said above, if organizations choose to implement
SRS, that's their business, but it shouldn't be a requirement. Or in RFC
lingo, SRS can be a MAY, but not a MUST.
Q3: Spammers can still insert headers pointing to their own throwaway domains
that will enable them to pass the spoof check, right?
A: True, but senders can place the directOnly flag on their EPD to indicate
that a further validation against the recipient's safe list ought to be
performed.
Q4: Why not include the RFC2821 HELO/EHLO domain in this algorithm?
A: Because verifying HELO/EHLO, if it tells us anything at all, might tell us
that the MTA is authorized to transmit messages. However, it tells us nothing
about whether that MTA is authorized to transmit on behalf of the *specific*
domain responsible for a given message.
OK, that's it.
Fire at will! :-)