nmh-workers
[Top] [All Lists]

[Nmh-workers] Re: enhancement to mhfinddup

2008-09-09 13:00:12
On Tue, 09 Sep 2008 11:41:43 EDT, bergman(_at_)panix(_dot_)com said:

Below you can find the diff (suitable for feeding to patch(1)) against 
mhfinddup 1.2.

Possible enhancement for 1.3 - I'd code and test but am swimming in other work
today...

                $msgs{$msgid} =~ m|^\+(.*)/(\d+)$|;
                my($f, $m) = ($1, $2);
                if ($folder eq $f || $no_same_folder) {
...
!                       my $sum1=md5_hex(@msgbody);

At this point, you could consider doing something like:

my %cached;

                if (exists $cached{"$folderpath/$m"}) {
                        $sum1=$cached{$msgid};
                } else {
                        $sum1=md5_hex(@msgbody);
                        $cached{"$folderpath/$m"}=$sum1;
                }

and similarly for $sum2. Probably should move all the open/read/close
inside the second part of the 'if' too...

Otherwise, if messages 100, 101, 102, 103, and 104
are in fact duplicates, you compute the md5sums for

100, 101, 100, 102, 100, 103, 100, 104, 101, 102, 101, 103,

And so on.  That way you only do N md5sums, not (N+1)*N/2  which is a lot
different for N=4,000.. ;)

Attachment: pgp_MmlL5oDnL.pgp
Description: PGP signature

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
http://lists.nongnu.org/mailman/listinfo/nmh-workers
<Prev in Thread] Current Thread [Next in Thread>