procmail
[Top] [All Lists]

Re: ASCII 128-255

2005-05-13 15:44:13
Actually, the discussion last week did not directly solve the
unprintable chacter issue, but it suggested a solution.  I am
including the rules I use below.  As you can see, I am filtering on
character sets in the subject line.  But if the incoming mail does
not define a charset, I do.  That way the issue of whether a charset is
defined in the message itself is eliminated.  The charset is defined by my
machine here.

Please feel free to correct anything I have wrong in this rule.  To coin a
phrase, I know just enough procmail to be dangerous.  ;^)

-------------------------
### Send email with non-western charsets to /dev/null.  They are spam.
#
# APNIC is email from Asia Pacific regions

APNIC="(58|59|60|61|202|203|210|211|218|219|220|221|222)"
ALL256="[0-9][0-9]?[0-9]?"
CHARSET="(ks_c_5601-1987|euc-kr|big5|gb2312|iso-2022-jp|shift-jis|windows-[0-9]+)"

:0
* $ ^Received:.*\\<${APNIC}\\.${ALL256}\\.${ALL256}\\.${ALL256}\\>
/dev/null

:0
* $ charset=\"?${CHARSET}\"?
/dev/null

:0
* $ ^(Subject|From): =\\?${CHARSET}\\?
/dev/null
------------------------------

+-----------------------------------------------------------------------+
| Christopher L. Barnard         O     When I was a boy I was told that |
| cbarnard(_at_)tsg(_dot_)cbot(_dot_)com         / \    anybody could become 
president.  |
| (312) 347-4901               O---O   Now I'm beginning to believe it. |
| http://www.cs.uchicago.edu/~cbarnard                --Clarence Darrow |
+----------PGP public key available via finger or PGP keyserver---------+


On Thu, 12 May 2005, Dallman Ross wrote:

On Thu, May 12, 2005 at 10:47:30PM +0200, Tomi Crnicki wrote:
Hello!

I suppose somebody asked this before but I couldn't find a helpfull
link.

Can I somehow filter out with procmail messages coming mostly from
Russia and ex-Russian countries and the Far East that don't have any
charset tags or have f.i. charset Win1251 (some others also
sometimes) and are filled with characters (subject and/or message)
with ASCII codes 128-255. All such messages are spam for our users.

Again I can't filter them all out as some messages have charset
Win1251 but are not spam - these no-spam messages contain only the
ASCII characters 32-127.

I believe it was only last week that we last addressed this issue.
A good place to look is the list archives (viewable from a link
a fair way down the page at www.procmail.org).  E.g., search for
"hi-bit"; "hibit"; "non-printing"; etc.  Here is part of what I
posted last week:

From: Dallman Ross
Sent: Tuesday, May 03, 2005 11:23 PM
To: procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
Subject: Re: exclude rules for percentage of high-bit characters


On Tue, May 03, 2005 at 03:58:42PM -0500, Christopher L. Barnard
wrote:

I would like to exclude email that is mostly unprintable
characters. [. . . ]

[. . . .]  I'm sure that someone has done this, I'm just not using
the right keyword in my search of the list archives.  Can someone
point me to how I would go about doing this?

This is one of various messages in the list archives about the
subject.  I did my search on "non-printing characters."  I
was aided in knowing what I was looking for.  (This was it.)
But there are other archived messages as well, including a
few from me about excluding German chars (for instance.  You
could search further with relative ease.  For example, try
"chars" instead of "characters."

http://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/procmail/2001-09/msg00281.html?53#mfs

Basically, this should do it (untested, however):

 SPACE = ' '
 TAB = '      '

 :0
 * BH ?? [^$TAB$SPACE-~]
 { HIBIT = TRUE }


--
dman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail


____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>