Actually, the discussion last week did not directly solve the
unprintable chacter issue, but it suggested a solution. I am
including the rules I use below. As you can see, I am filtering on
character sets in the subject line. But if the incoming mail does
not define a charset, I do. That way the issue of whether a charset is
defined in the message itself is eliminated. The charset is defined by my
machine here.
Please feel free to correct anything I have wrong in this rule. To coin a
phrase, I know just enough procmail to be dangerous. ;^)
-------------------------
### Send email with non-western charsets to /dev/null. They are spam.
#
# APNIC is email from Asia Pacific regions
APNIC="(58|59|60|61|202|203|210|211|218|219|220|221|222)"
ALL256="[0-9][0-9]?[0-9]?"
CHARSET="(ks_c_5601-1987|euc-kr|big5|gb2312|iso-2022-jp|shift-jis|windows-[0-9]+)"
:0
* $ ^Received:.*\\<${APNIC}\\.${ALL256}\\.${ALL256}\\.${ALL256}\\>
/dev/null
:0
* $ charset=\"?${CHARSET}\"?
/dev/null
:0
* $ ^(Subject|From): =\\?${CHARSET}\\?
/dev/null
------------------------------
+-----------------------------------------------------------------------+
| Christopher L. Barnard O When I was a boy I was told that |
| cbarnard(_at_)tsg(_dot_)cbot(_dot_)com / \ anybody could become
president. |
| (312) 347-4901 O---O Now I'm beginning to believe it. |
| http://www.cs.uchicago.edu/~cbarnard --Clarence Darrow |
+----------PGP public key available via finger or PGP keyserver---------+
On Thu, 12 May 2005, Dallman Ross wrote:
On Thu, May 12, 2005 at 10:47:30PM +0200, Tomi Crnicki wrote:
Hello!
I suppose somebody asked this before but I couldn't find a helpfull
link.
Can I somehow filter out with procmail messages coming mostly from
Russia and ex-Russian countries and the Far East that don't have any
charset tags or have f.i. charset Win1251 (some others also
sometimes) and are filled with characters (subject and/or message)
with ASCII codes 128-255. All such messages are spam for our users.
Again I can't filter them all out as some messages have charset
Win1251 but are not spam - these no-spam messages contain only the
ASCII characters 32-127.
I believe it was only last week that we last addressed this issue.
A good place to look is the list archives (viewable from a link
a fair way down the page at www.procmail.org). E.g., search for
"hi-bit"; "hibit"; "non-printing"; etc. Here is part of what I
posted last week:
From: Dallman Ross
Sent: Tuesday, May 03, 2005 11:23 PM
To: procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
Subject: Re: exclude rules for percentage of high-bit characters
On Tue, May 03, 2005 at 03:58:42PM -0500, Christopher L. Barnard
wrote:
I would like to exclude email that is mostly unprintable
characters. [. . . ]
[. . . .] I'm sure that someone has done this, I'm just not using
the right keyword in my search of the list archives. Can someone
point me to how I would go about doing this?
This is one of various messages in the list archives about the
subject. I did my search on "non-printing characters." I
was aided in knowing what I was looking for. (This was it.)
But there are other archived messages as well, including a
few from me about excluding German chars (for instance. You
could search further with relative ease. For example, try
"chars" instead of "characters."
http://www.xray.mpe.mpg.de/cgi-bin/w3glimpse2html/procmail/2001-09/msg00281.html?53#mfs
Basically, this should do it (untested, however):
SPACE = ' '
TAB = ' '
:0
* BH ?? [^$TAB$SPACE-~]
{ HIBIT = TRUE }
--
dman
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
____________________________________________________________
procmail mailing list Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail