Find Duplicated article titles in my .bib file
You could use perl
to go through the bib file, save all titles as a hash key with its line as the hash value, and then loop through it and print the title if its value has multiple entries. To do so, create a file with the following content, e.g. "finddupls.pl", change the bib file name, then execute perl finddupls.pl
in your terminal:
#!perl
my %seen = ();
my $line = 0;
open my $B, 'file.bib';
while (<$B>) {
$line++;
# remove all non-alphanumeric characters, because bibtex could have " or { to encapsulate strings etc
s/[^a-zA-Z0-9 _-]//ig;
# lower-case everything to be case-insensitive
# pattern matches lines which start with title
$seen{lc($1)} .= "$line," if /^\s*title\s*(.+)$/i;
}
close $B;
# loop through the title and count the number of lines found
foreach my $title (keys %seen) {
# count number of elements seperated by comma
my $num = $seen{$title} =~ tr/,//;
print "title '$title' found $num times, lines: ".$seen{$title},"\n" if $num > 1;
}
# write sorted list into file
open my $S, '>sorted_titles.txt';
print $S join("\n", sort keys %seen);
close $S;
It returns directly in the terminal something like this:
title 'observation on soil moisture of irrigation cropland by cosmic-ray probe' found 2 times, lines: 99,1350,
title 'multiscale and multivariate evaluation of water fluxes and states over european river basins' found 2 times, lines: 199,1820,
title 'calibration of a non-invasive cosmic-ray probe for wide area snow water equivalent measurement' found 2 times, lines: 5,32,
And it additionally writes a file sorted_titles.txt
listing all titles alphabetically ordered which you could go through and detect duplicates manually.
If you can rely on the title
field being identical, then a very simple:
grep -n 'title =' bibliography.bib | uniq -cdf 1
This will print only non-unique lines (-d
) and the number of times they appear (-c
) for the file bibliography.bib
and the line number they appear (-n
) in the bibliography file; the -f 1
tells uniq
to ignore the first field, which would be this line number.
So if you get a line like:
2 733: title = {Ethica Nicomachea},
You know that you have two appearances of title = {Ethica Nicomachea},
and the first of them appears on line 733 of your .bib
file.