MH Frequently Asked Questions (FAQ) with Answers
Section - Removing duplicate messages (Bourne)

Date: 20 Nov 1995 18:51:24 GMT

  Here's a simple-minded Bourne shell version.  It uses
  "scan" to get the message number and message-id of each message.  If
  a message has the same message-id as the previous message, the
  script adds its message number to the "remove" shell variable.

	scan -width 300 -format '%(msg) %{message-id}' |
	while read msg msgid; do
	    if [ "$msgid" = "$lastmsgid" ]; then
		remove="$remove $msg"
	rmm $remove

  That's pretty simple-minded.  For example, if the $remove variable
  gets too big, your system may complain.  And I'm sure there are some
  more-efficient ways to find the list of duplicate message-ids.  But
  that's the idea.
Subject: Removing duplicate messages (Perl)
From: rtor at (Owen Rees)
Date: 20 Nov 1995 12:39:47 GMT

  I wrote a perl script to do this some time ago. All the usual dire
  warnings about destructive technology apply - take a backup, do it on
  a copy, try it on a small test case first etc. Don't use this script
  unless you are prepared to accept the consequences.


$version = "rmmdup 1";

if (@ARGV == 0) { $folder = ""; }
elsif (@ARGV == 1) { $folder = $ARGV[0];
                     unless ( $folder =~ /^\+.+$/ )
                      { die "usage $0 [+folder]\n"; };
else { die "usage $0 [+folder]\n"; };

$rmmlist = "";

open (scan, "scan $folder -format '%(msg) %{message-id}'|");
while (<scan>)
 { if ( ($msg,$msgid) = /^(\d+) (<.*>)$/)
    { if ($msgs{$msgid})
       { print "$msg duplicates $msgs{$msgid}\n";
         $rmmlist .= " $msg";
      else { $msgs{$msgid} = $msg; };
if ( $rmmlist ) { exec "rmm $folder $rmmlist"; };

Subject: Removing duplicate messages (Perl)
From: Bill Wohler <wohler at>
Date: Sun, 17 Oct 2004 13:00:20 -0700

#!/usr/bin/perl -w
# Id: mhfinddup 6593 2004-09-02 16:34:24Z wohler 

=head1 NAME

mhfinddup - find duplicate messages


mhfinddup [options] [folder ...]


B<mhfinddup> finds and removes duplicate MH messages in the folders listed on
the command line (default: current folder). By default, you deal with
duplicate messages interactively. You can either remove the duplicate, not
remove the duplicate, or view the original and duplicate message before

If you use the B<-msgid> option to B<send>, then you probably don't want to
list any F<+outbox> folders if you are using the B<--no-same-folder> option
and you want to preserve your sent messages as well as your messages to
mailing lists.

Note that if you specify one or more folders, or if you use the B<--all>
option, B<mhfinddup> recursively descends the given folders.

=head1 CONTEXT

Context is per B<flist>(1). That is, if F<+folder> is given, it will become
the current folder. If multiple folders are given, the last one specified will
become the current folder.

=head1 OPTIONS

=over 4

=item --all

Look for duplicates in all folders. If any folders are specified, this option
is ignored.

=item --debug

Turn on debugging messages.

=item --help

Display the usage of this command.

=item --list

List duplicated messages.

=item --no-same-folder

Since it is common to use C<refile -link> to file a message in multiple
folders, this script doesn't consider messages in different folders to be
duplicates. Specify this option to list or remove duplicates across folders.

=item --rmm

Remove messages non-interactively. Use with care! For safety, the B<--list>
option takes precedence if specified and is a good option to use before using

=item --version

Display program version.



Returns 0 if all is well; non-zero otherwise.


=over 0

=item mhfinddup

Interactively remove duplicates from the current folder.

=item mhfinddup --all --list --no-same-folder

List all duplicates regardless if they are in different folders or not.

=item mhfinddup --rmm +lists

Remove all duplicates in F<+lists>, recursively.


=head1 SEE ALSO

B<rmm>(1), B<mhl>(1), B<scan>(1)

=head1 VERSION

Revision: 6593 

=head1 AUTHOR

Bill Wohler <wohler at>

Copyright (c) 2003 Newt Software. All rights reserved.

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, you can find it at or write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

=head1 METHODS


# Packages and pragmas.
use Getopt::Long;

use strict;

# Constants.
my $cmd;                                # name by which command called
($cmd = $0) =~ s|^\./||;                # ...minus the leading ./
my $ver = '6593';			# program version with CVS noise

# Variables (may be overridden by arguments).
my $all = 0;				# look in all folders
my $debug = 0;				# verbose mode
my $help = 0;				# display usage
my $version = 0;			# display version
my $list = 0;				# list duplicates
my $no_same_folder = 0;			# consider duplicates across folders
my $rmm = 0;				# remove duplicates without asking

# Constants.
my $mhl = "/usr/lib/mh/mhl";
my $tmp = "/tmp/mhfinddup$$";

# Parse command line.
# The use of the posix_default option is to ensure that folders like +a are
# not confused with --all. I'd really prefer to set prefix_pattern to "(--|-)"
# so that abbreviations of options can be used without being confused with
# folders, but I couldn't make it so.
my %opts;
Getopt::Long::Configure("pass_through", "posix_default");
GetOptions('all'		=> \$all,
	   'debug'		=> \$debug,
	   'help'		=> \$help,
	   'list'		=> \$list,
	   'no-same-folder'	=> \$no_same_folder,
	   'rmm'		=> \$rmm,
	   'version'		=> \$version,
	  ) or usage();

show_version() if ($version);
usage() if ($help || int(@ARGV) != int(map(/^\+/, @ARGV)));

my @folders = expand_folders(@ARGV);
print("Expanded " . join(" ", @ARGV) . " into\n" . join("\n", @folders) . "\n")
    if ($debug);

print("Scanning for duplicate messages...\n");
my %msgs;
foreach my $folder (sort @folders) {
    print("Scanning $folder...\n") if ($debug);
    open (SCAN,
	  "MHCONTEXT=$tmp scan +$folder -format '%(msg) %{message-id}'|");
    while (<SCAN>) {
	if (my ($msg, $msgid) = /^(\d+) (<.*>)$/) {
	    if ($msgs{$msgid}) {
		$msgs{$msgid} =~ m|^\+(.*)/(\d+)$|;
		my($f, $m) = ($1, $2);
		if ($folder eq $f || $no_same_folder) {
		    handle_dup($f, $m, $folder, $msg);
	    } else {
		$msgs{$msgid} = "+$folder/$msg";


sub expand_folders {
    my @folders = @_;

    print("Getting list of folders...");
	 "flist -recurse "
	  . (($all == 1 && @folders == 0) ? "-all" : join(" ", @folders))
	  . "|")
	or die("Could not determine folders\n");
    @folders = ();
    chomp(my $current_folder = `mhparam Current-Folder`);
    $current_folder = quotemeta($current_folder);
    while (<FOLDERS>) {
	my ($folder, $a, $b, $c, $d, $e, $f, $g, $count) = split;
	if ($folder =~ /^$current_folder\+$/) {
	    $folder =~ s/\+$//; # remove current folder indication
	next if ($count == 0);
	push(@folders, $folder);


sub handle_dup {
    my($f1, $m1, $f2, $m2) = @_;

    my $ans;

    print("+$f2/$m2 duplicate of +$f1/$m1");

    if ($list) {
    } else {
	if ($rmm) {
	    $ans = "y";
	} else {
	    print(", remove? [Yns?] ");
	    chomp($ans = <STDIN>);

	if ($ans eq "y" || $ans eq "") {
	    system("rmm +$f2 $m2");
	} elsif ($ans eq "s") {
	    system("$mhl `mhpath +$f1 $m1` `mhpath +$f2 $m2`");
	    goto repeat;
	} elsif ($ans eq "?") {
	    print("y, remove message (default)\n" .
		  "n, don't remove message\n" .
		  "s, show messages\n" .
		  "?, show this message\n");
	    goto repeat;

=head2 usage

Display usage information and exit.


sub usage {
    print <<EOF;
Usage: $cmd [options] [folder ...]
--all			remove duplicates in all folders
--debug			print actions that program takes
--help			display this message
--list			list duplicates only
--no-same-folder	consider duplicates even if in different folders
--rmm			remove duplicates without asking
--version		display program version

=head2 show_version

Display version information and exit.


sub show_version {
    print("$cmd version $ver\n".
          "Copyright (c) 2003 Bill Wohler <wohler at>\n\n".
          "$cmd comes with ABSOLUTELY NO WARRANTY.\n\n".
          "This is free software, and you are welcome\n".
          "to redistribute it under certain conditions.\n\n".
          "See `' for details.\n");

Local Variables:
mode: outline
outline-regexp: "^Subject:"
fill-prefix: "  "

User Contributions:

Comment about this article, ask questions, or add new information about this topic:


