procmail
[Top] [All Lists]

Re: character set spec bypassing filter?

2002-11-07 16:26:55
On Thu, 7 Nov 2002, Charles Gregory wrote:

CG>
CG> However, just today I received two e-mails that specify these strings
CG> clearly in the 'from' and 'subject' headers, but still got past the
CG> filter. It would appear to do so because of some effect of the surrounding
CG> characters on the line. If I use Pine's "full headers" command, I see
CG> expanded strings consisting of:
CG> From: "=?EUC-KR?B?sbnBpiCxs8ivx9C7/SC8vsXN?="
CG> Subject: =?EUC-KR?B?ucyxubGzyK/H0Lv9uPDB/VuxpLDtXQ==?=
CG>
CG> Interestingly enough, if I use Pine's *bounce* command, the 'Resent
CG> Subject' turns up as:
CG> Resent-Subject: =?X-UNKNOWN?B?ucyxubGzyK/H0Lv9uPDB/VuxpLDtXQ==?=
CG>
CG> So I suspect that some sort of processing is occurring, and that my
CG> procmail filter never really 'sees' the 'euc-kr' string, because of some
CG> 'handling' done on the control characters(?). My question is, how would I
CG> get procmail to ignore control characters so that it 'sees' the euc-kr
CG> that is obviously there?
CG>


I use this:

# Mime header extension in subject
:0
* ^Subject: =\?(gb2312|big5|ks_c_5601|2022-kr|euc-kr).*\?=
{
 # action

for mime encoded headers that contain undesirable charsets.  Although I am
going off filtering on charsets as I have had some false negatives (not
false positive - in my terminology, I let through desirable mail rather
than reject undesirable mail).


I test for the charsets in the mime attachments:

# Mime format with charset
# Multiline headers are grepped.
:0
* ^Content-Type:.*boundary
* B ?? ^Content-Type:(.|$)*charset=.?(big5|ks_c_5601|2022-kr|euc-kr)
{
 # action



I look for non-ascii chars in the subject


# 5% gagabuggee subject
# avoid empty subject
:0
* ^Subject: \/.+
{
  :0 D
  * -1^1 MATCH ?? .
  *  2^1 MATCH ?? =[0-9A-F][0-9A-F]
  * 20^1 MATCH ?? [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
  * 20^1 MATCH ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  * 20^1 MATCH ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  * 20^1 MATCH ?? =[A-F][0-9A-F]
  {
    # action




And pre-process if base64 header:


# B Mime header extension in subject?
:0
* ^Subject:.*=\?.*\?b\?\/.+\?=
{
  ## LOG="B mime header $MATCH $NL"
  MIMESUBJECT=`echo $MATCH | mimencode -u -b`
  ## LOG="B mime header $MIMESUBJECT $NL"

  # 5% gagabuggee subject
  :0 D
  * -1^1 MIMESUBJECT ?? .
  *  2^1 MIMESUBJECT ?? =[0-9A-F][0-9A-F]
  * 20^1 MIMESUBJECT ?? [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
  * 20^1 MIMESUBJECT ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  * 20^1 MIMESUBJECT ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  * 20^1 MIMESUBJECT ?? =[A-F][0-9A-F]
  {



and in the body:


# 5% gagabuggee body
:0 BD
* -1^1 .
*  2^1 =[0-9A-F][0-9A-F]
* 20^1 [ ¡¢£€¥Š§š©ª«¬­®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[A-F][0-9A-F]
{


The last one should match quoted printable encoded "chinese" characters
but not base64 encoded.



CG> Also, while I'm here, I've noticed another spammer trick, of late, is to
CG> send spam encoded as base64. I can capture this by looking for
CG> 'Content-Type: text/html
CG> Content-Transfer-Endocing: BASE64'
CG> (BASE64 is still legitimate for attachments)
CG>
CG> Is there a tool/module to DECODE the base64 so that procmail filtering
CG> checks on the message body can be performed? This would be preferable to
CG> treating all BASE64 text as spam.......
CG>


I did have a think about this one but have not done anything as it looked
rather complicated because the mime attachments can contain attachments -
ie the messagage body can be made of several parts but in turn, each part
can be made of several parts - if I read the rfc correctly.



Alan

( Please do not email me AS WELL as replying to the list. Personal
  email is welcome but may invoke a password autoresponder. )



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail