On Tue, 26 Nov 2002, James Clark wrote:
JC>
JC> Now I have a question:
JC>
JC> Is there a way to filter out non-english messages using procmail?
JC>
I use:
# Mime header extension in subject
:0
* ^Subject: =\?(gb2312|big5|ks_c_5601|2022-kr|euc-kr).*\?=
{
nl
nl=${SPAMREASON+"$NL"}
SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}mime header extension
charset"
#
# # Charset. But not gb2312
# :0
# * ^Content-Type:.*charset=.*(big5|ks_c_5601|2022-kr|euc-kr)
# {
# nl
# nl=${SPAMREASON+"$NL"}
# SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}charset"
# }
#
# Mime format with charset
# Multiline headers are grepped.
:0
* ^Content-Type:.*boundary
* B ?? ^Content-Type:(.|$)*charset=.?(big5|ks_c_5601|2022-kr|euc-kr)
{
nl
nl=${SPAMREASON+"$NL"}
SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}mime charset"
}
# 5% gagabuggee subject
# avoid empty subject
:0
* ^Subject: \/.+
{
:0 D
* -1^1 MATCH ?? .
* 2^1 MATCH ?? =[0-9A-F][0-9A-F]
* 20^1 MATCH ?? [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 MATCH ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 MATCH ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 MATCH ?? =[A-F][0-9A-F]
{
nl
nl=${SPAMREASON+"$NL"}
SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}subject gagabuggee
score = $="
# LOG="SCORE = $=$NL"
}
}
# B Mime header extension in subject?
:0
* ^Subject:.*=\?.*\?b\?\/.+\?=
{
## LOG="B mime header $MATCH $NL"
MIMESUBJECT=`echo $MATCH | mimencode -u -b`
## LOG="B mime header $MIMESUBJECT $NL"
# 5% gagabuggee subject
:0 D
* -1^1 MIMESUBJECT ?? .
* 2^1 MIMESUBJECT ?? =[0-9A-F][0-9A-F]
* 20^1 MIMESUBJECT ?? [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 MIMESUBJECT ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 MIMESUBJECT ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 MIMESUBJECT ?? =[A-F][0-9A-F]
{
nl
nl=${SPAMREASON+"$NL"}
SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}B mime subject
gagabuggee score = $="
# LOG="SCORE = $=$NL"
}
}
# 5% gagabuggee body
:0 BD
* -1^1 .
* 2^1 =[0-9A-F][0-9A-F]
* 20^1 [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[A-F][0-9A-F]
{
nl
nl=${SPAMREASON+"$NL"}
SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}body gagabuggee score
= $="
# LOG="SCORE = $=$NL"
}
I commented out the plain charset recipe because I was getting false
negatives. Also I am not checking base64 encoded attachments - needs to
be done.
Alan
( Please do not email me AS WELL as replying to the list. Personal
email is welcome but may invoke a password autoresponder. )
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail