procmail
[Top] [All Lists]

Re: Simplifying Mime messages

2002-11-22 12:07:42
On 22 Nov, Lars Poulsen wrote:
|  > =>  :0  # 021109 () base-64-encoded html head is shrouding more than 
charset
|  > =>   * Content-Type:(.*\<)?text/(html|plain)
|  > =>   * ^Content-Transfer-Encoding:(.*\<)?base64
|  > =>   | your action goes here
| 
| dman> I haven't had any non-spam caught by it since I initiated it
| dman> a couple of months ago.
| 
| Assuming that it is the BODY that you are scanning for these strings
| (and as I mentioned, I am already dealing with the ones that have these
| in the header), it would seem that this would yield a false positive on any
| message of the structure:
| 
|         Content-Type: multipart/mixed;
| 
|         Content-Type: text/plain;
|         Content-Transfer-Encoding: 8bit;
| 
|         Content-Type: application/octet-stream;
|         Content-Transfer-Encoding: base64;
| 
| This is a valid encoding, which I don't want to lose! My concern is with the
| ones that are structured as follows, and where the "multipart" actually
| has only one part:
| 
|         Content-Type: multipart/mixed;
| 
|         Content-Type: text/plain;
|         Content-Transfer-Encoding: base64;
| 

I don't want to speak for Dallman, but the recipe is not searching the
body. The thinking on the filter seems to be, if it's plain text or html
AND base 64 encoded, then it's spam. There would be no reason, I guess,
to encode plain text other than to circumvent content filters. Who does
that?  The bottom feeders.

I haven't seen what you're describing, but then I haven't looked
either. You seem to be looking for something like (untested):

  :0
  * ^Content-Type: multipart/mixed;
  { }
  :0A
  * -1^0
  *  1^1 B ?? Content-Type:
  # this message said it was multipart, but there was only one part
  # do whatever you want here.

The first recipe checks the headers for indication of a multipart
message. If that matches, the second recipe starts with a score of -1,
then searches the body, adding 1 for each occurence of "Content-Type:".
Unless there are at least 2, the recipe fails.  If the first recipe
doesn't match, the second one is bypassed.

If I've misunderstood and you specifically want to match the body for:

  Content-Type: text/plain;
  Content-Transfer-Encoding: base64;
  
then add these conditions to the first recipe:

  * B ?? Content-Type: text/plain;
  * B ?? Content-Transfer-Encoding: base64;

If I still don't get it, then maybe this'll get you started anyway.

-- 
Reply to list please, or append "8" to "procmail" in address if you must.
Spammers' unrelenting address harvesting forces me to this...reluctantly.



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail