Better terminology would help- I'm reading the above to mean that
they reduce every name to a second-level domain. In that case
if there's a spamvertized domain in, say, "nh.us" then all of nh.us
is impacted. Hardly seems like a good thing to me.
Actually the FAQ goes on to say (or at least I infer- it's hard to
read), that reports are counted at each level (so a report on
foo.nh.us would add a count to both "foo.nh.us" and "nh.us"). However
it also recommends that the client check the second level domain
first, and also says that it's likely that *only* second level domains
will be listed in the future.
I understand the goal of trying to de-randomize (and normalize)
spamvertized URLs. But I don't really think subdomain stripping alone
is workable. Normalizing a la URIDNSBL seems slightly better,
although still not perfect.
The "solution" that I use in my filter is that I list (in my HOSTS file) both
fully qualified host names and IP addresses (forced to 127.0.0.1) AND a set of
names that look like:
*.domain.tld
or even
*.dom1.dom2.tld
or the like. In the case of non-US domains, I'd store this as something like
*.domain.co.uk
while a regular .com domain I'd choose to blacklist as
*.junkdomain.com
The mail filter loads these blacklist domains from HOSTS and stores these in a
hash table when it starts. When it finds a URL in an E-mail, it first looks to
see if it matches an entry in the blacklist table exactly as it is. If not, it
adds "*." to the URL's machine address and tries again. If still no, it
removes
levels to the right of the *. one at a time and searches again until there's
nothing left, or until it finds a matching entry in the blacklist table.
This allows the searches to go VERY fast (due to the hash technique used for
the
table) and it's insensitive to all these bogus randomly generated subdomain
names that spammers employ to try to avoid people successfully blocking their
machine names.
Long-term, this isn't a perfect solution because ultimately spammers just use
disposable throw-away single-use domain names (at $35 a shot or less they can
afford to do that, after all) and the table will grow without limit. So
probably this will have to revert eventually to some kind of cached, aged list
or something (or even off to an indexed disk file for stuff that hasn't been
seen in a while, perhaps).
Of course, remember the old joke about the two old friends being chased by a
bear through the woods. The one says to the other, "Do you really think you
can
outrun that bear?" "No," says the other, "...but I don't need to. I only need
to outrun YOU." Ultimately, what matters is that my mail filter gets my spam
mail down to an acceptably low level, and I use the mail that slips through to
improve the variety of filtering techniques I use in my filter.
Spammers continue to try new tricks and devious strategies that they are
confident will screw up Bayesian filters or other classical filtering methods.
But since my filter doesn't use all the same techniques that most other folks
use, and since the pattern matching capabilities available to me using SPITBOL
are far richer than the braindead regex-based pattern matching that most other
antispam filters use, the great majority of tricks that spammers use are
usually
fairly easily dealt with by my filter.
Gordon Peterson http://personal.terabites.com/
1977-2002 Twenty-fifth anniversary year of Local Area Networking!
Support free and fair US elections! http://stickers.defend-democracy.org
12/19/98: Partisan Republicans scornfully ignore the voters they "represent".
12/09/00: the date the Republican Party took down democracy in America.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg