Limit number of matches of find command
As you're not using find
for very much other than walking the directory tree, I'd suggest instead using the shell directly to do this. See variations for both zsh
and bash
below.
Using the zsh
shell
mv ./**/*(-.D[1,1000]) /path/to/collection1 # move first 1000 files
mv ./**/*(-.D[1,1000]) /path/to/collection2 # move next 1000 files
The globbing pattern ./**/*(-.D[1,1000])
would match all regular files (or symbolic links to such files) in or under the current directory, and then return the 1000 first of these. The -.
restricts the match to regular files or symbolic links to these, while D
acts like dotglob
in bash
(matches hidden names).
This is assuming that the generated command would not grow too big through expanding the globbing pattern when calling mv
.
The above is quite inefficient as it would expand the glob for each collection. You may therefore want to store the pathnames in an array and then move slices of that:
pathnames=( ./**/*(-.D) )
mv $pathnames[1,1000] /path/to/collection1
mv $pathnames[1001,2000] /path/to/collection2
To randomise the pathnames
array when you create it (you mentioned wanting to move random files):
pathnames=( ./**/*(-.Doe['REPLY=$RANDOM']) )
You could do a similar thing in bash
(except you can't easily shuffle the result of a glob match in bash
, apart for possibly feeding the results through shuf
, so I'll skip that bit):
shopt -s globstar dotglob nullglob
pathnames=()
for pathname in ./**/*; do
[[ -f $pathname ]] && pathnames+=( "$pathname" )
done
mv "${pathnames[@]:0:1000}" /path/to/collection1
mv "${pathnames[@]:1000:1000}" /path/to/collection2
mv "${pathnames[@]:2000:1000}" /path/to/collection3
You can implement new tests for find
using -exec
:
seq 1 1000 |
find . -exec read \; -exec mv {} /path/to/collection1 +
will move the first 1000 files found to /path/to/collection1
.
This works as follows:
seq 1 1000
outputs 1000 lines, piped intofind
;-exec read
reads a line, failing if the pipe is closed (whenseq
’s output has been consumed);- if the previous
-exec
succeeds,-exec mv ...
performs the move.
-exec ... +
works as you’d expect: read
will run once per iteration, but find
will accumulate matched files and call mv
as few times as possible.
This relies on the fact that find
’s -exec
succeeds or fails based on the executed command’s exit status: when read
succeeds, find
continues processing the actions given above (because the default operator is “and”), and when it fails, find
stops.
If your find
supports the -quit
action, you can use that to improve the efficiency:
seq 1 1000 |
find . \( -exec read \; -o -quit \) -exec mv {} /path/to/collection1 +
Without that, find
will test every single file, even though it will only keep 1000 for mv
.
I’m assuming that read
is available as an external command, and implements the POSIX specification for read
; if that’s not the case, sh -c read
can be used instead. In both cases, find
will start a separate process for each file it checks.
I don't think it can be done with just find
. You can use something like:
find [... your parameters ...] -print0 | head -z -1000 | xargs -0 mv -t /path/to/collection
-print0
, -z
, and -0
work together to make sure everything works even with linefeeds in filenames.