spf-discuss
[Top] [All Lists]

Re: getting 2822 protection as well as 2821 protection

2004-04-07 10:47:33
Dustin D. Trammell wrote:

So how would you differentiate the above example from this one:

Envelope: i(_dot_)am-spammer(_at_)jimramsay(_dot_)com(_dot_)spammer(_dot_)net
From: i(_dot_)am(_at_)jimramsay(_dot_)com
Reply-to: 
i(_dot_)am(_dot_)freshened(_dot_)on(_dot_)the(_dot_)spamlist(_at_)jimramsay(_dot_)com(_dot_)spammer(_dot_)net
Sender: i(_dot_)am(_at_)jimramsay(_dot_)com

I would consider that "not close enough" because 'jimramsay.com.spammer.net' is obviously not in the same domain hierarchy as 'jimramsay.com'. I suppose the "right way" would be to match from right-to-left a few levels (more than just 1!) instead of left-to-right:

Comparing to 'jimramsay.com':
'holmes.jimramsay.com' would match
'01.02.03.04.jimramsay.com' would match
'mail.yahoo.com' would not (need more than just '.com' to match)
'jimramsay.com.spoof.org' would not

Or something similar.  Not enough similarities?  Enough to consider it
'first-class'?  I think that if your doing interesting things with your
envelope, reply-to, etc., then we shouldn't try to detect this and still
classify it as 'first-class', it would simply fall into the
'second-class' bucket and still be seen by the user as probably
legitimate mail.  If we're getting into the business of classifying
mail, 'first-class' should be absolutely verifiable as legitimate and
anything else would be a lesser class.  In the example of using C/R
systems, it's an unfortunate side-effect that the addresses don't match.

True, that is an unfortunate side-effect that the address do not match exactly, but the addresses do match within a certain well-defined pattern:

user [ -optionalextensions ] @ [ optionalhostname. ] rest.of.domain.com

I think a pretty good algorithm for deciding whether all the various addresses match would be as follows:

1 - Find the shortest user-part of all the addresses to be compared. Call this 'A' 2 - Find the shortest domain-part of all the addresses to be compared. Call this 'B'
3 - Score starts at 0
4 - If all the user-parts are exactly the same, score +1. If all the user-parts start with 'A', score +0.5 5 - If all the domain-parts are exactly the same, score +1. If all the domain-parts end with 'B', score +0.5
6 - There are three types of match, depending on personal preference:
    - Lenient match - consider 'first-class' if Score >= 1
    - Conservative match - consider 'first-class' if Score > 1
    - Strict match - consider 'first-class' if Score == 2

A user could choose Lenient match, Conservative match, or Strict match depending on what they think is 'good enough'.

I suppose my other question is: What about SRS? Won't all SRS-forwarded mail also end up as not-first-class?

--
Jim Ramsay
"Me fail English?  That's unpossible!"