On Thu, 7 Nov 2002, Charles Gregory wrote:
CG>
CG> However, just today I received two e-mails that specify these strings
CG> clearly in the 'from' and 'subject' headers, but still got past the
CG> filter. It would appear to do so because of some effect of the surrounding
CG> characters on the line. If I use Pine's "full headers" command, I see
CG> expanded strings consisting of:
CG> From: "=?EUC-KR?B?sbnBpiCxs8ivx9C7/SC8vsXN?="
CG> Subject: =?EUC-KR?B?ucyxubGzyK/H0Lv9uPDB/VuxpLDtXQ==?=
CG>
CG> Interestingly enough, if I use Pine's *bounce* command, the 'Resent
CG> Subject' turns up as:
CG> Resent-Subject: =?X-UNKNOWN?B?ucyxubGzyK/H0Lv9uPDB/VuxpLDtXQ==?=
CG>
CG> So I suspect that some sort of processing is occurring, and that my
CG> procmail filter never really 'sees' the 'euc-kr' string, because of some
CG> 'handling' done on the control characters(?). My question is, how would I
CG> get procmail to ignore control characters so that it 'sees' the euc-kr
CG> that is obviously there?
CG>
I use this:
# Mime header extension in subject
:0
* ^Subject: =\?(gb2312|big5|ks_c_5601|2022-kr|euc-kr).*\?=
{
# action
for mime encoded headers that contain undesirable charsets. Although I am
going off filtering on charsets as I have had some false negatives (not
false positive - in my terminology, I let through desirable mail rather
than reject undesirable mail).
I test for the charsets in the mime attachments:
# Mime format with charset
# Multiline headers are grepped.
:0
* ^Content-Type:.*boundary
* B ?? ^Content-Type:(.|$)*charset=.?(big5|ks_c_5601|2022-kr|euc-kr)
{
# action
I look for non-ascii chars in the subject
# 5% gagabuggee subject
# avoid empty subject
:0
* ^Subject: \/.+
{
:0 D
* -1^1 MATCH ?? .
* 2^1 MATCH ?? =[0-9A-F][0-9A-F]
* 20^1 MATCH ?? [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 MATCH ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 MATCH ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 MATCH ?? =[A-F][0-9A-F]
{
# action
And pre-process if base64 header:
# B Mime header extension in subject?
:0
* ^Subject:.*=\?.*\?b\?\/.+\?=
{
## LOG="B mime header $MATCH $NL"
MIMESUBJECT=`echo $MATCH | mimencode -u -b`
## LOG="B mime header $MIMESUBJECT $NL"
# 5% gagabuggee subject
:0 D
* -1^1 MIMESUBJECT ?? .
* 2^1 MIMESUBJECT ?? =[0-9A-F][0-9A-F]
* 20^1 MIMESUBJECT ?? [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 MIMESUBJECT ?? [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 MIMESUBJECT ?? [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 MIMESUBJECT ?? =[A-F][0-9A-F]
{
and in the body:
# 5% gagabuggee body
:0 BD
* -1^1 .
* 2^1 =[0-9A-F][0-9A-F]
* 20^1 [ ¡¢£€¥Š§š©ª«¬®¯°±²³Žµ¶·ž¹º»ŒœŸ¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[A-F][0-9A-F]
{
The last one should match quoted printable encoded "chinese" characters
but not base64 encoded.
CG> Also, while I'm here, I've noticed another spammer trick, of late, is to
CG> send spam encoded as base64. I can capture this by looking for
CG> 'Content-Type: text/html
CG> Content-Transfer-Endocing: BASE64'
CG> (BASE64 is still legitimate for attachments)
CG>
CG> Is there a tool/module to DECODE the base64 so that procmail filtering
CG> checks on the message body can be performed? This would be preferable to
CG> treating all BASE64 text as spam.......
CG>
I did have a think about this one but have not done anything as it looked
rather complicated because the mime attachments can contain attachments -
ie the messagage body can be made of several parts but in turn, each part
can be made of several parts - if I read the rfc correctly.
Alan
( Please do not email me AS WELL as replying to the list. Personal
email is welcome but may invoke a password autoresponder. )
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail