procmail
[Top] [All Lists]

Re: W32(_dot_)Klez(_dot_)H(_at_)mm recipe?

2002-11-26 13:36:30
On Tue, 26 Nov 2002, James Clark wrote:

JC>
JC> Now I have a question:
JC>
JC> Is there a way to filter out non-english messages using procmail?
JC>


I use:


# Mime header extension in subject
:0
* ^Subject: =\?(gb2312|big5|ks_c_5601|2022-kr|euc-kr).*\?=
{
  nl
  nl=${SPAMREASON+"$NL"}
  SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}mime header extension
charset"

#
# # Charset.  But not gb2312
# :0
# * ^Content-Type:.*charset=.*(big5|ks_c_5601|2022-kr|euc-kr)
# {
#   nl
#   nl=${SPAMREASON+"$NL"}
#   SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}charset"
# }
#


# Mime format with charset
# Multiline headers are grepped.
:0
* ^Content-Type:.*boundary
* B ?? ^Content-Type:(.|$)*charset=.?(big5|ks_c_5601|2022-kr|euc-kr)
{
  nl
  nl=${SPAMREASON+"$NL"}
  SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}mime charset"
}


# 5% gagabuggee subject
# avoid empty subject
:0
* ^Subject: \/.+
{
  :0 D
  * -1^1 MATCH ?? .
  *  2^1 MATCH ?? =[0-9A-F][0-9A-F]
  * 20^1 MATCH ?? [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
  * 20^1 MATCH ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  * 20^1 MATCH ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  * 20^1 MATCH ?? =[A-F][0-9A-F]
  {
    nl
    nl=${SPAMREASON+"$NL"}
    SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}subject gagabuggee
score = $="
    # LOG="SCORE = $=$NL"
  }
}


# B Mime header extension in subject?
:0
* ^Subject:.*=\?.*\?b\?\/.+\?=
{
  ## LOG="B mime header $MATCH $NL"
  MIMESUBJECT=`echo $MATCH | mimencode -u -b`
  ## LOG="B mime header $MIMESUBJECT $NL"

  # 5% gagabuggee subject
  :0 D
  * -1^1 MIMESUBJECT ?? .
  *  2^1 MIMESUBJECT ?? =[0-9A-F][0-9A-F]
  * 20^1 MIMESUBJECT ?? [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
  * 20^1 MIMESUBJECT ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  * 20^1 MIMESUBJECT ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  * 20^1 MIMESUBJECT ?? =[A-F][0-9A-F]
  {
    nl
    nl=${SPAMREASON+"$NL"}
    SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}B mime subject
gagabuggee score = $="
    # LOG="SCORE = $=$NL"
  }
}



# 5% gagabuggee body
:0 BD
* -1^1 .
*  2^1 =[0-9A-F][0-9A-F]
* 20^1 [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[A-F][0-9A-F]
{
  nl
  nl=${SPAMREASON+"$NL"}
  SPAMREASON="${SPAMREASON}${nl}${SPAMREASON_HEADER}body gagabuggee score
= $="
  # LOG="SCORE = $=$NL"
}



I commented out the plain charset recipe because I was getting false
negatives.  Also I am not checking base64 encoded attachments - needs to
be done.




Alan

( Please do not email me AS WELL as replying to the list. Personal
  email is welcome but may invoke a password autoresponder. )


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail