[Asrg] Blacklisting spam-promoted domains

Better terminology would help- I'm reading the above to mean that

they reduce every name to a second-level domain.  In that case 
if there's a spamvertized domain in, say, "nh.us" then all of nh.us
is impacted.  Hardly seems like a good thing to me.

Actually the FAQ goes on to say (or at least I infer- it's hard to

read), that reports are counted at each level (so a report on
foo.nh.us would add a count to both "foo.nh.us" and "nh.us").  However
it also recommends that the client check the second level domain
first, and also says that it's likely that *only* second level domains
will be listed in the future.

I understand the goal of trying to de-randomize (and normalize)

spamvertized URLs.  But I don't really think subdomain stripping alone
is workable.  Normalizing a la URIDNSBL seems slightly better,
although still not perfect.

The "solution" that I use in my filter is that I list (in my HOSTS file) both 
fully qualified host names and IP addresses (forced to 127.0.0.1) AND a set of 
names that look like:

    *.domain.tld

or even

    *.dom1.dom2.tld

or the like.  In the case of non-US domains, I'd store this as something like

    *.domain.co.uk

while a regular .com domain I'd choose to blacklist as 

    *.junkdomain.com

The mail filter loads these blacklist domains from HOSTS and stores these in a 
hash table when it starts.  When it finds a URL in an E-mail, it first looks to 
see if it matches an entry in the blacklist table exactly as it is.  If not, it 
adds "*." to the URL's machine address and tries again.  If still no, it 
removes 
levels to the right of the *. one at a time and searches again until there's 
nothing left, or until it finds a matching entry in the blacklist table.

This allows the searches to go VERY fast (due to the hash technique used for 
the 
table) and it's insensitive to all these bogus randomly generated subdomain 
names that spammers employ to try to avoid people successfully blocking their 
machine names.

Long-term, this isn't a perfect solution because ultimately spammers just use 
disposable throw-away single-use domain names (at $35 a shot or less they can 
afford to do that, after all) and the table will grow without limit.  So 
probably this will have to revert eventually to some kind of cached, aged list 
or something (or even off to an indexed disk file for stuff that hasn't been 
seen in a while, perhaps).

Of course, remember the old joke about the two old friends being chased by a 
bear through the woods.  The one says to the other, "Do you really think you 
can 
outrun that bear?"  "No," says the other, "...but I don't need to.  I only need 
to outrun YOU."  Ultimately, what matters is that my mail filter gets my spam 
mail down to an acceptably low level, and I use the mail that slips through to 
improve the variety of filtering techniques I use in my filter.

Spammers continue to try new tricks and devious strategies that they are 
confident will screw up Bayesian filters or other classical filtering methods.  
But since my filter doesn't use all the same techniques that most other folks 
use, and since the pattern matching capabilities available to me using SPITBOL 
are far richer than the braindead regex-based pattern matching that most other 
antispam filters use, the great majority of tricks that spammers use are 
usually 
fairly easily dealt with by my filter.

Gordon Peterson                  http://personal.terabites.com/
1977-2002  Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections!  http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg