nmh-workers
[Top] [All Lists]

Re: Sort and delete duplcate messages

2020-05-04 09:23:25
I know that 'sortm -textfield Subject' will sort messages accoring to
the subject field. Having run that command, is there a way to then
delete the first duplicate of each message in the list such that if 1
and 2 are duplicates and 6 and 7 are duplicates you would delete messages
2 and 7 leaving 1 and 6?

The attached might be a useful starting point: run it with a directory
name (e.g. ~/Mail) and it'll find everything that looks like an MH mail
file (i.e. its name is a number) and delete any messages with an
already-seen Message-ID — i.e. the second and subsequent copies of any
emails.

I'd run it on a duplicate of your Mail folder and see what the diffs
look like: biggest issue is that I'm not sure what order it'll do things
in so you may find your preferred copy doesn't get kept — but its output
will be a good starting point for any better version.

Conrad
#!/usr/bin/perl

use strict;
use warnings;

use Email::Simple;
use File::Find;

die "Syntax: $0 <dir> [..]" unless @ARGV >= 1;

my %ids;
find(sub {
  my $file = $_;
  return unless $file =~ /^\d+$/ && -f $file;
  open EMAIL, "<$file" or die "couldn't read \"$File::Find::name\"!";
  my $email = do { local $/; <EMAIL> };
  close EMAIL;
  my $msg = Email::Simple->new($email);
  if(my $id = $msg->header("Message-id")) {
    if($ids{$id}) {
      unlink $_;
      print "Seen $File::Find::name ($ids{$id})\n";
    } else {
      $ids{$id} = $File::Find::name;
    }
  } else {
    warn "No ID in \"$File::Find::name\"!";
  }
}, @ARGV);
<Prev in Thread] Current Thread [Next in Thread>