On Tue, 09 Sep 2008 11:41:43 EDT, bergman(_at_)panix(_dot_)com said:
Below you can find the diff (suitable for feeding to patch(1)) against
mhfinddup 1.2.
Possible enhancement for 1.3 - I'd code and test but am swimming in other work
today...
$msgs{$msgid} =~ m|^\+(.*)/(\d+)$|;
my($f, $m) = ($1, $2);
if ($folder eq $f || $no_same_folder) {
...
! my $sum1=md5_hex(@msgbody);
At this point, you could consider doing something like:
my %cached;
if (exists $cached{"$folderpath/$m"}) {
$sum1=$cached{$msgid};
} else {
$sum1=md5_hex(@msgbody);
$cached{"$folderpath/$m"}=$sum1;
}
and similarly for $sum2. Probably should move all the open/read/close
inside the second part of the 'if' too...
Otherwise, if messages 100, 101, 102, 103, and 104
are in fact duplicates, you compute the md5sums for
100, 101, 100, 102, 100, 103, 100, 104, 101, 102, 101, 103,
And so on. That way you only do N md5sums, not (N+1)*N/2 which is a lot
different for N=4,000.. ;)
pgp_MmlL5oDnL.pgp
Description: PGP signature
_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
http://lists.nongnu.org/mailman/listinfo/nmh-workers