procmail
[Top] [All Lists]

Re: File Attachments

2002-12-11 16:18:39
At 10:52 2002-12-11 -0700, Dave Cook did say:
I use the following recipe to filter out file attachments that my carry a
virus.  For some reason, this also filters out file attachments of type .msg
and .html  Does anyone know why this recipe is doing this?

Have you tried setting VERBOSE=ON and checking the procmail log? Have you inspected the messages to ensure that there are not multiple attachments or possibly matches WITHIN the content?

:0B
* ^[ \t]*name.*\.(vbs|exe|hta|scr|pif|js|bat|com|wma|chm)|\
  ^.*name.*\".*\.(exe|vbs|hta|scr|pif|js|bat|com|wma|chm)\"|\
  ^Content-.*\".*\.(hta|vbs|exe|scr|pif|js|bat|com|wma|chm)\"|\
  ^filename=.*\".*\.(hta|vbs|exe|scr|pif|js|bat|com|wma|chm)\"|\
   ^name=.*\".*\.(hta|vbs|exe|scr|pif|bat|mp3|com|wma|chm)\"|\
   ^name=.*.*\.(hta|vbs|exe|scr|pif|bat|mp3|com|wma|chm)|\
   ^name=*.\.(hta|vbs|exe|scr|pif|bat|mp3|wma|chm)|\
   ^.*name=.*\.(vbs|exe|hta|scr|pif|bat|mp3|wma|chm)|\
   ^filename=.*\"worms.zip\"
{

Note that if you're not doing anything else, bracing a no-condition recipe doesn't buy you anything except for added complexity. Also, when writing to a file, you should use a LOCKFILE (the trailing ':' on the flags line).

I'm not quite sure why you omit some extensions from the condition for some of the conditions. I suspect the erratic use of extensions stems from having multiple lines with a bunch of extensions listed on them and you might not be propogating the extensions to all of them.

I trust that you're using \t to represent a tab for providing your recipe for review, even though procmail doesn't support that syntax - a hard tab would actually exist in the rc file. As I indicate in my disclaimer (see .sig), unless expressly indicated otherwise, any time you see a [ ] in a recipe I've written, it can generally be assumed that it contains a space and a hard tab (even if the tab is not rendered in your mail client as such), because there's little logic to bracing a _single_ character, or bracing multiple occurrences of the same character (since it defines a class). If in fact, you use the \t beliveing that it works as in C and Perl, you need to shake yourself of the habit.

Now, let me rewrite this, in an easier to read (and maintain) format. I'm not going to go hog-wild about analysing it to compare each condition against something which would actually be encountered. Re-ordering the extensions provides us with the consistent portion of the extensions list:


           ^[ \t]*name.*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|js)|\
           ^.*name.*\".*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|js)\"|\
         ^Content-.*\".*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|js)\"|\
        ^filename=.*\".*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|js)\"|\
            ^name=.*\".*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|mp3)\"|\
              ^name=.*.*\.(hta|vbs|exe|scr|pif|wma|chm|bat|com|mp3)|\
                ^name=*.\.(hta|vbs|exe|scr|pif|wma|chm|bat|mp3)|\
              ^.*name=.*\.(hta|vbs|exe|scr|pif|wma|chm|bat|mp3)|\
        ^filename=.*\"worms.zip\"

Thus, the exact conditions dramatically shrink, even when we include them all without redundant conditions removed:

EXTS="hta|vbs|exe|scr|pif|wma|chm|bat"

1          ^[ \t]*name.*\.(${EXTS}|com|js)|\
2          ^.*name.*\".*\.(${EXTS}|com|js)\"|\
3        ^Content-.*\".*\.(${EXTS}|com|js)\"|\
4       ^filename=.*\".*\.(${EXTS}|com|js)\"|\
5           ^name=.*\".*\.(${EXTS}|com|mp3)\"|\
6             ^name=.*.*\.(${EXTS}|com|mp3)|\
7               ^name=*.\.(${EXTS}|mp3)|\
8             ^.*name=.*\.(${EXTS}|mp3)|\
9       ^filename=.*\"worms.zip\"

(lines are numbered for below reference)

On the sixth line, I don't understand why you have a double ".*" Surely, this is a typo? Also, the seventh line, where you have "=*.\." ? zero or more equals, any char, then a dot? No, again, this appears to be a typo (. and * reversed). Correcting these two lines makes the two expressions overlap - the first one encompases everything which the second would match, making the second one unnecessary. Then, on line 8, that expression ALSO overlaps (excepting that it wouldn't include .com - I'm not sure why you wouldn't want that executable extension included - I suspect any difference in extensions may be an oversight based on the jumble of expressions you're using, which is why placing the common extensions into a variable serves to simplify the expression so much) - zero or more of anything in front of the expression would match when there's nothing in front of the expression, thus, lines 6 and 7 can be removed and line 8 have "com" added to the extensions list (assuming that you didn't exclude it from that condition for a reason, otherwise, retain line 6).

I'm not really sure why you include mp3 in your executable extension list. I'll assume you have a reason. Similarly, I don't know why you omit .com and .js from some of the conditions.

Lines 3 and 4 can be combined easily.  Line 9 should have that dot escaped.

Lines examining for a quoted filename probably are _really_ expecting zero or more WHITESPACE characters preceeding the opening quote, not zero of more of *ANYTHING*.

More consolodation could be performed, though not understanding why you have different extension criteria prevents me from doing that effectively without brutalizing your logic. Also, quoted strings versus nonquoted strings _could_ be handled with a simple conditional such as (\"|), but that doesn't ensure that for any OPENING quote that there MUST be a closing quote, since each quote would be independantly optional rather than handled as a pair. For your purposes, this might not be critical.

After cleanup, what I'm left with is:

EXTS="hta|vbs|exe|scr|pif|wma|chm|bat"

:0B:
* $ ^[ \t]*name.*\.(${EXTS}|com|js)|\
        ^.*name[        ]\".*\.(${EXTS}|com|js)\"|\
        ^(Content-|filename=).*[        ]*\".*\.(${EXTS}|com|js)\"|\
        ^name=[         ]\".*\.(${EXTS}|com|mp3)\"|\
        ^.*name=.*\.(${EXTS}|com|mp3)|\
        ^filename=[     ]\"worms\.zip\"
viruscontrol

This probably doesn't do a thing for your mismatch problem, but should make some of the matches somewhat more "correct", and the expression isn't nearly as wasteful.

I then ran the above filter against a testbox of spam, and sure enough, it matched several HTML spams, each of which included embedded forms of the type:

<input type="hidden" name="fromemail" value="me(_at_)me(_dot_)com">
<input type="hidden" name="from" value="me(_at_)me(_dot_)com">

If you missed it, that sort of thing is matched by the second condition of your original filter:

  ^.*name.*\".*\.(exe|vbs|hta|scr|pif|js|bat|com|wma|chm)\"|\

I'd seriously rethink preceeding any expression like that with '.*', and evaluating what lead you to include that. Perhaps:

        ^[      ]*(file|)name[  ]*(=|)[         ]*(\"|).*\.($EXTS)\>|\

might do the trick (that rolls SEVERAL of your original conditions into one, and also requires that the _final_ extension be trailed by a wordbreak character). The above expression is untested however.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • File Attachments, Dave Cook
    • Re: File Attachments, Professional Software Engineering <=