ietf-asrg
[Top] [All Lists]

[Asrg] Brad Templeton's C/R Guidelines

2003-05-27 09:52:05
Here is a list of C/R guidelines compiled by Brad Templeton, who wrote one of the early C/R systems (from http://www.templetons.com/brad/spam/challengeresponse.html). To summarize:

o Never challenge any mail that's a reply to a private message you sent.
o Avoid challenging replies to public messages
o Use multiple addresses
o Never challenge mailing list mail
o Never challenge a challenge!
o Make the "From" on your challenge match the address mailed to
o Put an In-reply-to header on your challenge
o Include the subject of the original message in the challenge
o Present a regular summary of all blocked mail
o Make the challenge as easy as you can make workable.
o Don't force users to re-send mail
o Detect all attempts to subscribe to mailing lists
o Detect mailing lists subscribed to in the user's mail archives
o Detect patterns of possible incoming mailing lists
o Think about anonymous E-mail


-----------snip-------------
Proper principles for Challenge/Response anti-spam systems

Back in 1997 I wrote what is probably the first of the challenge/response (C/R) spam-blocking systems. These are systems that, when they see an E-mail from somebody you've never corresponded with before, hold the mail and e-mail back a "challenge" to confirm that the person is a real sender and not a mailing robot, in particular a spammer. The other person gets the challenge, and responds to it in some way. If they do this properly, your system releases the mail that was held, and from then on they can mail without challenge.

There are a number of these systems springing up -- it's a very effective system and a fairly obvious idea -- but not everybody is doing it right, so I thought I would lay out some "best practices" based on my 6 years of experience. I don't even do all of these things, because I wrote my system before they became necessary, but if I were writing a new version, I would.
Never challenge any mail that's a reply to a private message you sent.

If you send somebody private mail (from any address you have), and they reply to you with any mailer, you should accept their mail and not send them a challenge. This is true even if they reply from a different address than you sent the mail to. Many people have mail aliases, and receive mail on one address and send on another. Some people use other anti-spam systems that generate new addresses every time they mail.

What this means is that simply whitelisting all addresses you mail to is not enough, though it is of course an important thing to do.

One of the easiest ways to do this, by the way, is to have multiple addresses yourself. Send out private mail with an address that does not do challenges. An old fashioned unfiltered E-mail box (though you may want to note the addresses on incoming mail to whitelist them.) However, you must be sure this address won't get out to spammers, or you will have to switch it to another. (You must be prepared to do this.)

In general, you should probably put an un-challenged address on business cards too. Save filtered addresses for public use. Postings to mailing lists, listings on web pages, listings in conference directories, etc.
Avoid challenging replies to public messages

If you can do it, avoid challenging replies to your public messages to mailing lists and newsgroups. With private mailing lists (not archived in public) you can of course accept any replies with reasonable safety based on subject line and in-reply-to. With public postings, consider accepting replies unchallenged for a few days to weeks after postings, then add a challenge for late replies which are more likely to be spammers.
Use multiple addresses

Any good spam filtering system will support giving the user multiple aliases under which to receive mail. This has two functions. One, you can filter some aliaes more than others. For example, you might have "public" addresses used in newsgroup postings and on a web site, and private addresses used only in mails to private parties, replies etc. You would use less filtering, and perhaps no challenge/response, on private addresses.

It's also handy to provide a gamut of addresses to use so that you can use a different address every time you give out an address. For example, if entering data on a web page that asks for your E-mail address, use a different one each time. That way if any address gets on spammer's lists, you can delete it or give it very high spam filtering with minimal risk to mail from others.

The best plan is to have your own subdomain for mail, allowing an infinite space of addresses. However, if that is not available to you, sendmail treats mail to "userid+anything" as mail to the given userid. For example, if a sendmail user has the address fbaggins(_at_)shire(_dot_)org, then fbaggins+ring(_at_)shire(_dot_)org and fbaggins+bagend(_at_)shire(_dot_)org and all other such addresses will be delivered to the main address. Qmail does a similar system, using a dash instead of a plus. That's better, since unfortunately there are huge numbers of badly coded web forms that, because they map "+" to a space, don't accept fully legal e-mail addresses with a plus in them.

The personal domain is also best because spammers can easily guess the root address on a plus-sign based address. If you use this, you must have filter the base address, and have unfiltered addresses use the plus.

Some systems generate a new address for every mail sent, using a special random string in the address itself. Some use a cryptographically secure hash to generate the string so they can immediately identify any address they identified without having to remember them.

Be aware, however, that in generating many addresses, you may mesh badly with other whitelist systems expecting your mail to come from the same address. One option is to use the base address in the "From:" and put any generated address, especially an unfiltered one, in the Reply-to. Beware that there are mailers that botch Reply-to out there.
Never challenge mailing list mail

For decades, all good mail responders have known not to respond to mailing list mail. An unofficial standard has indicated that bulk mail of various forms would have a header like "Precedence: bulk" or sometimes "Precedence: list" to mark it as bulk. "Precedence: junk" is rarely used for it would declare things to be spam!

You can also test to see that none of the addresses in the "To" and "CC" lines is an address for the person getting the mail, though that does present a maintenance problem since there is no automatic way to know all those addresses. However, you definitely should not challenge any mail with the above precedence headers.
Who to challenge?

There are three possible addresses you can challenge. They are the "Envelope From", the regular "From:" and the address in a "Reply-to:" header.

Most merit points to challenging the Envelope From, which is the address you would send bounce errors to. The "From:" is the person who wrote the message (and thus in most cases, though not all, the person you are trying to confirm is a human being.) The "Reply-to" is the address that the sender expected actual replies to the mail to go.

Unfortunately, you definitely should not challenge more than one of these.

A challenge is similar to a bounce error, but unfortuantely in many cases it is not handled by a human -- it was in fact designed to be not handled by a human. Most such cases are list mail, which you should not be challenging at all. In the case of list mail, the Envelope-From always identifies the list manager itself, not the particular poster to the mailing list. Sometimes it is a unique address, so that programs can automate detecting bounces without having to parse them to try to figure out what mail bounced.

The From is often the actual person who posted to the mailing list, or the real sender of a person to person mail. Some lists have all mail come "from" the list manager, however. Some lists have the list address be in the Reply-to.

You must not challenge individual mailers to a list, so only challenge the From or Reply-to when you are sure it is not list mail. If you challenge individual mailers you'll get bounced of the list very quickly.

The answer here thus depends on how good your detection of list mail is. If it's reliable, you may decide to challenge the From or Reply-to, since that is more assured to be a human. On the other hand, challenging the Envelope-From has many merits. The worst case is that it's not a human (or it's a list that is not tagging list mail as such) and this mail will appear in the digest, hopefully near the top.
Never challenge a challenge!

The other person might have a C/R program or a whitelist.
Make the "From" on your challenge match the address mailed to

When they send out their mail they will have whitelisted the address they sent to, so any challenge From that address should get through.
Put an In-reply-to header on your challenge

The challenge should refer to the message-id of the mail being challenged. A good whitelist program should remember the message-id of every mail the user sends out, and every challenge sent out. If a challenge comes back with an in-reply-to, you can identify it as a valid challenge. In the end, this may become the main technique, once spammers try to guess the names of your friends and send spam disguised as challenges. They can't fake this message-id.

The other reason to record the outgoing message-id is to be sure you never challenge anybody replying to mail you sent out. If mail has an in-reply-to that matches an outgoing message-id of a private mail of yours, you let it in.
Include the subject of the original message in the challenge

C/R programs should also log outgoing subjects, so that they can detect replies (and challenges) to the user's messages.
Present a regular summary of all blocked mail

No system is perfect, so the system must present a summary on some reasonable interval, of mail that was blocked by the system. This would include mailing list mail that was unchallenged, and mail to which the challenge was never responded.

This should be presented as a summary digest, which allows a quick scan of all these messages. The summary should show a minimal set of relevant headers (From, To, Subject, CC etc.) and a few lines from the body. It should also show a "spam score" calculated for the message, and the digest should be sorted by spam-score, so the lowest scores appear at the top.

With each message in the digest, the user should be able to select the message to define what to do with it, including delivering it, whitelisting the sender, whitelisting the mailing list it came from, and combinations. It can also offer options like blacklisting the sender, tuning the spam-score, and reporting the spam to collaborative filters.

Any existing spam scoring system can be used. The fact that the challenged address did not exist or the mail to it bounced may give a high spam-score, but one should be wary of the affect of this on anonymous mail.

The summary can be e-mailed every so often (once a day typically, or less frequently for people who read mail less frequently) or a web option should be available to see the latest summary. Normally messages would not appear in the summary until they have had some period of time to get a response to the challenge -- typically a daily digest will have the prior day's messages in it.

This step is vital. If this is not done, users will miss mail for mailing lists they joined, mail from people who decide not to answer challenges, and mail from people whose mail software is incompatible with the challenge.
Understand mail/postings to public vs. private addresses

As noted, the best practice is to use an address that does not have C/R on mail to private parties. It is important however to use a C/R filtered address if the mail/posting will go out in public. This includes all newsgroup postings, and any mail to mailing lists which have public archives. An ideal system would modify outgoing mail, using a non-filtered address on private mail, and a public address on mail that may be exposed in public.
Make the challenge as easy as you can make workable.

Spammers are not currently trying to rake responses to spam challenges, but they will. Until they do, asking for any reply at all actually works well as a challenge. Once they do, challenges must require some special action from the responder, something to prove they are human. Even so, try to make it as easy as possible, and provide several means of responding to the challenge.

For example, send your challenge as a multipart/alternative with plain text and HTML. In both, include a link the user can click on to make their response via a browser. However, since many people read mail offline or without a browser handy, always allow the response to come in E-mail.

Don't require the user to be online to see the challenge, ie. don't use inlined image files unless absolutely necessary.

While the challenge must come "From:" the address that was mailed, it can have a Reply-to that sends the response to a specific handler with a unique address that lets you know what challenge is being answered. Since some users will not deal properly with the Reply-to, it is advised you also detect responses at the address which was in the From: of the challenge. In your challenge, put a magic token in the Subject line, Message-id and body, and if that token appears in any part of the response -- Subject, In-Reply-To or body, you will be able to identify the response, no matter what address it comes from.

If you ask the user to answer a question, be as forgiving as possible i finding it in the body or subject of the response. If the user makes a bad response, give them an error to know their mail is not yet delivered.
Don't force users to re-send mail

Some challenges indicate the original mail was not delivered, and ask the user to send it again. Users will balk at this, and if they felt they were doing the recipient a favour (such as answering a question they asked in a public forum) they often will not bother to jump through any hoops to respond to challenges or re-send mail. You must make it as easy as possible.
Detect all attempts to subscribe to mailing lists

Watch outgoing mail and look for any attempts by the user to subscribe to a mailing list. This includes mail to "-subscribe" or "-request" addresses especially with "subscribe" in the subject or at the start of a line in the body. Try to understand the subscribe requests of most major mailing list systems, such as majordomo, listserv, topica, yahoo egroups, etc.

When the user subscribes to a list, you need to identify the list and whitelist it.

You can subscribe to lists via the web, though many then do a 2nd confirmation of the subscribe -- usually also by web -- which you may be able to look for. You must also avoid challenging these confirmations, even though they will not come with a Precedence bulk. In some cases users may have to avoid signing up for lists via the web without telling the C/R system.
Detect mailing lists subscribed to in the user's mail archives

Most C/R systems do a pre-scan of the user's archived mail folders, outgoing and incoming, as well as address books, to whitelist all proper correspondents in advance. Detect the presence of mailing lists in these archives to whitelist them in advance. You can't challenge mailing list mail so this is important. You will need to extract the Envelope From, as opposed to the "From:" header, in many cases, to properly spot mailing lists. Of course, you must avoid scanning spam to avoid whitelisting it.
Detect patterns of possible incoming mailing lists

Fortunately most spammers don't actually maintain real mailing lists that send multiple mailings to a user with the same Envelope From, and they don't use Precedence headers. You should, however, look for patterns in these headers on incoming list mail. (List mail to be identified by Precedence header and lack of the user's address in To/Cc headers.)

For example, if you get a sudden surge of messages, all with the same Envelope-From for the target user, this may be a mailing list the user has subscribed to. This is especially true if the messages have low spam scores.

In this event, consider placing a special note at the top of the digest summary, or in a special message, saying something like, "You have recently received 6 mailing list messages from a list identifying itself as XYZ" and provide a means to say they wish to whitelist the list or perhaps blacklist it. If they whitelist it, deliver the mail. Give them a way to examine the potential list mail.

This is needed because you won't catch every mailing list subscription they do. Especially since in many cases you can subscribe to lists via the web.

Be warned however, that some mailing list managers put magic tokens in the envelope-from, to more easily track bounces. However many popular list managers also put in special "list" headers that help you identify the list. This includes headers like List-ID, and a "Sender" header.
Think about anonymous E-mail

Anonymous E-mail is still a useful thing. In part, you allow it by providing the daily digest of mail that was unresponded, with low spam scores coming first. Of course two-way remailers let you send a challenge and get a response by E-mail. If you insist on response by web you make it a little harder. Offering both lets the anonymous mailer select the best way to protect her identity.

Other systems (e-stamps etc.) which may not work on their own can have application to allow anon mailers to get through C/R systems.
Spammers may try to fake the things you detect

Spammers will eventually try to fake out all things you look for in order to avoid challenging or filtering e-mail. However, they will not do this right away. Since all things you do that make it harder for mail to get in will increase your risk of blocking desired mail, don't apply any stricter test until it actually becomes necessary.

Among the tests I have listed here, risks exist in the following areas.
Spammers will eventually try to guess what mailing lists you are on, or what correspondents you have whitelisted, and they will forge mail to appear like that. This is especially true with any publicly archived mailing list you post to. Lists will eventually need digital signature if this attack becomes common. If you allow replies to your messages to come in based on subject, then spammers will form replies to your public messages. To avoid this, you may wish to allow unchallenged replies only for a limited time on public messages. Try to be liberal at first, and only close down when spammers abuse the liberty. Don't try to prevent something that's not yet happening if it has a risk of blocking legitimate mail.

C/R may, over time, lose its utility if most spammers try to target it directly. However, it still has several years of life. It can also be combined with other techniques. For example, if you have a good spam filter, you might decide to challenge only messages with high spam scores or other reasons to suspect they are spam, and let through other mail.
-----------snip-------------

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>