Dustin D. Trammell wrote:
So how would you differentiate the above example from this one:
Envelope: i(_dot_)am-spammer(_at_)jimramsay(_dot_)com(_dot_)spammer(_dot_)net
From: i(_dot_)am(_at_)jimramsay(_dot_)com
Reply-to:
i(_dot_)am(_dot_)freshened(_dot_)on(_dot_)the(_dot_)spamlist(_at_)jimramsay(_dot_)com(_dot_)spammer(_dot_)net
Sender: i(_dot_)am(_at_)jimramsay(_dot_)com
I would consider that "not close enough" because
'jimramsay.com.spammer.net' is obviously not in the same domain
hierarchy as 'jimramsay.com'. I suppose the "right way" would be to
match from right-to-left a few levels (more than just 1!) instead of
left-to-right:
Comparing to 'jimramsay.com':
'holmes.jimramsay.com' would match
'01.02.03.04.jimramsay.com' would match
'mail.yahoo.com' would not (need more than just '.com' to match)
'jimramsay.com.spoof.org' would not
Or something similar. Not enough similarities? Enough to consider it
'first-class'? I think that if your doing interesting things with your
envelope, reply-to, etc., then we shouldn't try to detect this and still
classify it as 'first-class', it would simply fall into the
'second-class' bucket and still be seen by the user as probably
legitimate mail. If we're getting into the business of classifying
mail, 'first-class' should be absolutely verifiable as legitimate and
anything else would be a lesser class. In the example of using C/R
systems, it's an unfortunate side-effect that the addresses don't match.
True, that is an unfortunate side-effect that the address do not match
exactly, but the addresses do match within a certain well-defined pattern:
user [ -optionalextensions ] @ [ optionalhostname. ] rest.of.domain.com
I think a pretty good algorithm for deciding whether all the various
addresses match would be as follows:
1 - Find the shortest user-part of all the addresses to be compared.
Call this 'A'
2 - Find the shortest domain-part of all the addresses to be compared.
Call this 'B'
3 - Score starts at 0
4 - If all the user-parts are exactly the same, score +1. If all the
user-parts start with 'A', score +0.5
5 - If all the domain-parts are exactly the same, score +1. If all the
domain-parts end with 'B', score +0.5
6 - There are three types of match, depending on personal preference:
- Lenient match - consider 'first-class' if Score >= 1
- Conservative match - consider 'first-class' if Score > 1
- Strict match - consider 'first-class' if Score == 2
A user could choose Lenient match, Conservative match, or Strict match
depending on what they think is 'good enough'.
I suppose my other question is: What about SRS? Won't all
SRS-forwarded mail also end up as not-first-class?
--
Jim Ramsay
"Me fail English? That's unpossible!"