procmail
[Top] [All Lists]

Re: Hard one for the gurus - Extracting files from a uuencoded mess age

2002-08-20 01:09:23
On 15 Aug, Odhiambo G. Washington wrote:
| * Don Hammond <procmail(_at_)tradersdata(_dot_)com> [20020814 22:53]: wrote:
| > On 14 Aug, Odhiambo G. Washington wrote:
| > | * Don Hammond <procmail(_at_)tradersdata(_dot_)com> [20020814 21:57]: 
wrote:
| > | [...]
| > | > 
| > | > :0b:uue.lock
| > | > * ^From:(_dot_)*subscribe(_at_)drweb(_dot_)ru
| > | > *     B ?? ^begin [0-7][0-7][0-7] \/.*
| > | > *     B ?? ^end$
| > | > * 1^3 B ?? ^M
| > | > |uudecode >"/path/to/final/destination/$MATCH"
| > | 
| > | Hello Don,
| > | 
| > | This recipe gives me a zero-sized zipped file.
| > | 
| > | I could do better if it could also unzip the file , in case the recipe
| > | can get some facelift ;-)
| > | 
| > 
| > Yes, sorry. As Udi pointed out, uudecode has no standard output. Change
| > the action line to:
| > 
| > |uudecode -o "/path/to/final/destination/$MATCH"
| 
| 
| 
| Cool ;-)
| 
| This works, only I don't know how to tell it to go ahead and unzip the $MATCH.

There are a (at least) a couple aspects of this I didn't think
through very well.

1. If it were me, until I was sure there were no problems, and probably
even longer, I'd keep a copy of the original.

2. The scored condition is nonsense. It will succeed with a single match
on ^M (that's M at the begining of the line, not Ctrl-M), so why keep
going with the scoring?  There are a couple possible fixes.

You could omit it altogether if all messages from subscribe(_at_)drweb(_dot_)ru
will have uuencoded zip files. This is probably reasonable since the
other 3 conditions should indicate existence of the attachment with
reliability approaching 100%. So, adding an unzip to the mix [1][2][3]:

(The From: condition has been tightened up a bit, and a copy of the
original is delivered to $DEFAULT)

  0bwc:uue.lock
  * ^From:.*\<subscribe(_at_)drweb\(_dot_)ru\>
  * B ?? ^begin [0-7][0-7][0-7] \/.+
  * B ?? ^end$
  * B ?? ^`$
  | uudecode && unzip "$MATCH" -d /extract/dir && rm $MATCH
  :0A:
  $DEFAULT

[1] My unzip doesn't support reading from standard input. If your's
does, your action can be: "|uudecode -o - |unzip -d /extract/dir",
skipping the rm and mitigating the path concerns described below.

[2] If the file to be unzipped already exists, then unzip will probably
want to prompt you - which obviously won't be any good. I assume
procmail just hangs at that point. You can add the -o option to the
unzip invocation to force writing over the existing file, but that may
not be what you want either. If the files as unzipped are uniquely
named, this isn't an issue. If not, you'll either need to have some
other process dispose of or archive each one before the next comes in,
or you'll have to munge the filename (e.g. add epoch seconds or some
such to $MATCH).

If you still want to check for lines beginning with "M" to ensure it's
really a uuencoded attachment, you could use the following to identify
so with certainty. (Pretty sure, anyway.) ;-)
 
  :0bwc:uue.lock
  * ^From:.*\<subscribe(_at_)drweb\(_dot_)ru\>
  * B ?? ^begin [0-7][0-7][0-7] \/.+($M.+)+$.+$`$end$
  * MATCH ?? ^^\/.+
  * MATCH ?? \.zip^^
  | uudecode && unzip "$MATCH" -d /extract/dir && rm $MATCH
  :0A:
  $DEFAULT

If the attachments are large, and you want to save scanning the whole
thing, you might want to settle for a "begin" line, an "end" line, a "`"
line, and check that the "begin" line is immediately followed by at
least 4 more lines (adjust as necessary) starting with "M":

  :0bwc:uue.lock
  * ^From:.*\<subscribe(_at_)drweb\(_dot_)ru\>
  * B ?? ^begin [0-7][0-7][0-7] \/.+
  * B ?? ^end$
  * B ?? ^`$
  * B ?? ^begin [0-7][0-7][0-7] .+($M.+)($M.+)($M.+)($M.+)
  | uudecode && unzip "$MATCH" -d /extract/dir && rm $MATCH
  :0A:
  $DEFAULT

I believe uudecode will always run error free on any message that
matches the conditions, so the "w" flag probably wouldn't have helped on
the original. But it's been added to these because unzip will report an
error. You could skip the "c" flag and the "A" extra delivery, because
an error will cause the original to be "recovered" anyway. But I figured
the additional delivery serves as notice that the file has arrived. You
must have write permission to the working directory as procmail knows
it, otherwise you'll have to add the -o /extract/dir/$MATCH option to
uudecode and need to add the path to the rm command also.

There's some choices. Choose your poison.

This does nothing to protect against a malicious path in the attachment,
as Udi pointed out. If something like that were in the uuencoding, it
could seemingly be mitigated by providing the -o /extract/dir/$MATCH
option to uudecode. But you might be subject to the same risk inside the
zip file. (You'll have to check if unzip has an option to strip a
leading "/", or does it automatically.)  I'm presuming the sender is
trusted and that's not of concern, but you should be aware that the
From: header is completely unreliable. This does introduce the
possibility of this kind of compromise from someone pretending to be
subscribe(_at_)drweb(_dot_)ru(_dot_) That risk might be sufficiently small, but 
...

[3] If it were me, and I wanted it unzipped automatically, I'd do the
whole thing with a shell script invoked from procmail. The script would
check errors suitably, and check for absolute paths in the zipfile
before unzipping it, and notify me of problems by email.

-- 
Reply to list please, or append "8" to "procmail" in address if you must.
Spammers' unrelenting address harvesting forces me to this...reluctantly.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail