Convert glob to `find`
If the problem is that you get an argument-list-is-too-long error, use a loop, or a shell built-in. While command glob-that-matches-too-much
can error out, for f in glob-that-matches-too-much
does not, so you can just do:
for f in foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg
do
something "$f"
done
The loop might be excruciatingly slow, but it should work.
Or:
printf "%s\0" foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg |
xargs -r0 something
(printf
being builtin in most shells, the above works around the limitation of the execve()
system call)
$ cat /usr/share/**/* > /dev/null
zsh: argument list too long: cat
$ printf "%s\n" /usr/share/**/* | wc -l
165606
Also works with bash. I'm not sure exactly where this is documented though.
Both Vim's glob2regpat()
and Python's fnmatch.translate()
can convert globs to regexes, but both also use .*
for *
, matching across /
.
find
(for the -name
/-path
standard predicates) uses wildcard patterns just like globs (note that {a,b}
is not a glob operator; after expansion, you get two globs). The main difference is the handling of slashes (and dot files and dirs not being treated specially in find
). *
in globs won't span several directories. */*/*
will cause up to 2 levels of directories to be listed. Adding a -path './*/*/*'
will match any files that are at least 3 levels deep and won't stop find
from listing the contents of any directory at any depth.
For that particular
./foo*bar/quux[A-Z]{.bak,}/pic[0-9][0-9][0-9][0-9]?.jpg
couple of globs, it's easy to translate, you're wanting directories at depth 3, so you can use:
find . -mindepth 3 -maxdepth 3 \
\( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o \
-path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' \) \
-exec cmd {} +
(or -depth 3
with some find
implementations). Or POSIXly:
find . -path './*/*/*' -prune \
\( -path './foo*bar/quux[A-Z].bak/pic[0-9][0-9][0-9][0-9]?.jpg' -o \
-path './foo*bar/quux[A-Z]/pic[0-9][0-9][0-9][0-9]?.jpg' \) \
-exec cmd {} +
Which would guarantee that those *
and ?
could not match /
characters.
(find
, contrary to globs would read the content of directories other than foo*bar
ones in the current directory¹, and not sort the list of files. But if we leave aside the problem that what is matched by [A-Z]
or the behaviour of *
/?
with regards to invalid characters is unspecified, you'd get the same list of files).
But in any case, as @muru has shown, there's no need to resort to find
if it's just for splitting the list of files into several runs to work around the limit of the execve()
system call. Some shells like zsh
(with zargs
) or ksh93
(with command -x
) even have builtin support for that.
With zsh
(whose globs also have the equivalent of -type f
and most other find
predicates), for instance:
autoload zargs # if not already in ~/.zshrc
zargs ./foo*bar/quux[A-Z](|.bak)/pic[0-9][0-9][0-9][0-9]?.jpg(.) -- cmd
((|.bak)
is a glob operator contrary to {,.bak}
, the (.)
glob qualifier is the equivalent of find
's -type f
, add oN
in there to skip the sorting like with find
, D
to include dot-files (doesn't apply to this glob))
¹ For find
to crawl the directory tree like globs would, you'd need something like:
find . ! -name . \( \
\( -path './*/*' -o -name 'foo*bar' -o -prune \) \
-path './*/*/*' -prune -name 'pic[0-9][0-9][0-9][0-9]?.jpg' -exec cmd {} + -o \
\( ! -path './*/*' -o -name 'quux[A-Z]' -o -name 'quux[A-Z].bak' -o -prune \) \)
That is prune all directories at level 1 except the foo*bar
ones, and all at level 2 except the quux[A-Z]
or quux[A-Z].bak
ones, and then select the pic...
ones at level 3 (and prune all directories at that level).
You could write a regex for find matching your requirements:
find . -regextype egrep -regex './foo[^/]*bar/quux[A-Z](\.bak)?/pic[0-9][0-9][0-9][0-9][^/]?\.jpg'