How do I find which files are missing from a list?
You could use stat
to determine if a file exists on the file system.
You should use the built in shell functions to test if files exist.
while read f; do
test -f "$f" || echo $f
done < file_list
The "test" is optional and the script will actually work without it, but I left it there for readability.
Edit: If you really have no option but to work for a list of filenames without paths, I suggest you build a list of files once with find, then iterate over it with grep to figure out which files are there.
find -type f /dst > $TMPFILE
while read f; do
grep -q "/$f$" $TIMPFILE || echo $f
done < file_list
Note that:
- the file list only includes files not directories,
- the slash in the grep match pattern is so we compare full file names not partials,
- and the last '$' in the search pattern is to match the end of the line so you don't get directory matches, only full file name patches.
find
considers finding nothing a special case of success (no error occurred). A general way to test whether files match some find
criteria is to test whether the output of find
is empty. For better efficiency when there are matching files, use -quit
on GNU find to make it quit at the first match, or head
(head -c 1
if available, otherwise head -n 1
which is standard) on other systems to make it die of a broken pipe rather than produce long output.
while IFS= read -r name; do
[ -n "$(find . -name "$name" -print | head -n 1)" ] || printf '%s\n' "$name"
done <file_list
In bash ≥4 or zsh, you don't need the external find
command for a simple name match: you can use **/$name
. Bash version:
shopt -s nullglob
while IFS= read -r name; do
set -- **/"$name"
[ $# -ge 1 ] || printf '%s\n' "$name"
done <file_list
Zsh version on a similar principle:
while IFS= read -r name; do
set -- **/"$name"(N)
[ $# -ge 1 ] || print -- "$name"
done <file_list
Or here's a shorter but more cryptic way of testing the existence of a file matching a pattern. The glob qualifier N
makes the output empty if there is no match, [1]
retains only the first match, and e:REPLY=true:
changes each match to expand to 1
instead of the matched file name. So **/"$name"(Ne:REPLY=true:[1]) false
expands to true false
if there is a match, or to just false
if there is no match.
while IFS= read -r name; do
**/"$name"(Ne:REPLY=true:[1]) false || print -- "$name"
done <file_list
It would be more efficient to combine all your names into one search. If the number of patterns is not too large for your system's length limit on a command line, you can join all the names with -o
, make a single find
call, and post-process the output. If none of the names contain shell metacharacters (so that the names are find
patterns as well), here's a way to post-process with awk (untested):
set -o noglob; IFS='
'
set -- $(<file_list sed -e '2,$s/^/-o\
/')
set +o noglob; unset IFS
find . \( "$@" \) -print | awk -F/ '
BEGIN {while (getline <"file_list") {found[$0]=0}}
wanted[$0]==0 {found[$0]=1}
END {for (f in found) {if (found[f]==0) {print f}}}
'
Another approach would be to use Perl and File::Find
, which makes it easy to run Perl code for all the files in a directory.
perl -MFile::Find -l -e '
%missing = map {chomp; $_, 1} <STDIN>;
find(sub {delete $missing{$_}}, ".");
print foreach sort keys %missing'
An alternate approach is to generate a list of file names on both sides and work on a text comparison. Zsh version:
comm -23 <(<file_list sort) <(print -rl -- **/*(:t) | sort)