Re: recipe to dump chinese spam

On Wed, 5 Jan 2000 06:08:55 -0600 (CST), David Efflandt
<efflandt(_at_)xnet(_dot_)com> wrote:

Does anyone have a procmail recipe to dump chinese spam into /dev/null?


This has been discussed here fairly recently. You may want to look for
"China" or "Chinese" and "spam" in the list's search engine, which is
at <http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/>
(it's not really "the list's", it's "Achim Bohnet's", but it's a good
archive which goes several years back).

In many cases the type is listed as US-ASCII or "us-ascii" even
though it is not.


This can be a good clue. Based on the earlier discussions about this,
I have implemented the following recipes:

    # Friendly spam from China (argh argh argh)
    :0
    * ^(Subject|From):.*=\?gb2312\?"
    { REJECT="$REJECT${REJECT:+$NL}${REJ}Subject or from in GB2312 encoding" }

    :0
    * ^\/Author: Liu Yong Jun
    { REJECT="$REJECT${REJECT:+$NL}${REJ}Spam signature header $MATCH" }

  ### WARNING: I have substituted \200 and \377 for literal ASCII 128
  ###  and 255, respectively, in the copy I post to the list
  ###  (not in the next comment, though; it's supposed to be human-readable)
    :0 # four or more of \200-\377 in a row in Subject or From
    * ^\/(From|Subject):.*[\200-\377][\200-\377][\200-\377][\200-\377]
    {
        HEADER=$MATCH
        :0
        * HEADER ?? ^^\/[^:]+
        { }
        REJECT="$REJECT${REJECT:+$NL}${REJ}Long 8bit sequence in $MATCH"
    }

    :0 # same for quoted-printable
    * ^\/(From|Subject):.*\
        =[8-F][0-9A-F]=[8-F][0-9A-F]=[8-F][0-9A-F]=[8-F][0-9A-F]
    {
        HEADER=$MATCH
        :0
        * HEADER ?? ^^\/[^:]+
        { }
        REJECT="$REJECT${REJECT:+$NL}${REJ}Long QP sequence in $MATCH"
    }

    :0B # Finally, same for body
    * [\200-\377][\200-\377][\200-\377][\200-\377]|\
        =[8-F][0-9A-F]=[8-F][0-9A-F]=[8-F][0-9A-F]=[8-F][0-9A-F]
    { REJECT="$REJECT${REJECT:+$NL}${REJ}Long 8bit sequence in body" }

This is based on the observation that while some languages I receive
mail in do use accented characters, it is virtually impossible to find
a valid example of four accented characters in a row, whereas these
are legio in Chinese text (not "accented characters", of course, but
bytes in the 8-bit range). You could probably bump it up to somewhere
like 16 bytes if you're scared of mismatches, but at that point, you
should probably revert to scoring instead, like in the earlier threads
on this topic.

In most cases the source IP of the spam does not resolve to a name
or the name changes frequently (could be .com, .net or .cn). But
one of the websites I am on sends mail (form data) from an smtp
server that does not do reverse DNS lookup (name resolves, but IP
does not). So mail I do want to receive sometimes says "May be
forged".


What I've been doing is look at the first non-local IP address and
dump anything from hosts in the range 202.90.xxx.yyy to
202.199.xxx.yyy (actually I think 202.96-110 are the ones to look out
for but the too wide coverage hasn't been problematic, yet); here is
the recipe for that:

    # My own little blacklist
    :0
    * RBLIP ?? [0-9]
    * ? echo "$RBLIP" | grep -f $HOME/procmail/ip-block.txt
    { REJECT="$REJECT${REJECT:+$NL}${REJ}IP blocked in ip-block.txt: $RBLIP" }


where the file $HOME/procmail/ip-block.txt contains the following:

202\.9[0-9]\.[0-9]*\.
202\.1[0-9][0-9]\.[0-9]*\.

As for how to set RBLIP to the first non-local IP address, I again
refer you back to the earlier threads about this. Look for "Dnes", I
think Walter has been posting to or referred to in all relevant
threads. (They may not all be about Chinese spam.)

This is a bit harsh, but the only false positives (if you can call
them that) are messages from Chinese admins in response to my spam
complaints. To my knowledge, there are at least three people in China
who understand English and sometimes respond to spam complaints, but
not very frequently. Whether you want to risk losing those messages
can be a tough call :-/

Anyway, I don't send anything to /dev/null, and I monitor the spam
tank closely, so for me, false positives are not a big issue. YMMV.

I was also wondering if there is a way to bounce mail with certain
features or is procmail too late in the mail process to do that. I
imagine that would be impossible when mail headers are forged or
someone uses their own bogus mail server on a dynamic dialup
connection.


Hmm. You should look at various RBL-type lists and use them directly
from Sendmail. Any spam-spewing server should be in the MAPS RBL. Any
dynamic dialup should be in the MAPS DUL. You should not accept SMTP
connections from either, in any case. Sendmail 8.9.3 comes with hooks
to consult these lists, and others. For more information about these
lists, see e.g. <http://www.iki.fi/era/rbl/rbl.html>

The observation that it's a little bit late to reject spam from within
Procmail if you can do it earlier in the process is basically correct.

Also, I'd recommend that you don't send stuff to /dev/null -- I have
always had to regret that sooner or later. If nothing else, you can
analyse the spam in order to better understand how to protect
yourself. And nothing is more frustrating than false positives on
important mail.

FWIW, my spam levels have gone up some 300% since mid-November, and
practically all of this is because of a smallish number of hyperactive
Chinese spammers who simply will not lose their connection to the
Internet, because the clue deficit among admins down there is large
enough to ... ugh, it's so bad I can't come up with metaphors which
are not less mind-boggling than the real thing. In case you're
interested, there's a chart at <http://www.iki.fi/era/spam/dates.gif>

(The file index.html in the same directory contains an index of all
the spam messages, but you probably don't want to load that unless
you're extremely bored and have a very quick connection.)

Hope this helps,

/* era */

-- 
 Too much to say to fit into this .signature anyway: <http://www.iki.fi/era/>
  Fight spam in Europe: <http://www.euro.cauce.org/> * Sign the EU petition