Find recursively all archive files of diverse archive formats and search them for file name patterns
If you want something simpler that the AVFS solution, I wrote a Python script to do it called arkfind. You can actually just do
$ arkfind /path/to/search/ -g "*vacation*jpg"
It'll do this recursively, so you can look at archives inside archives to an arbitrary depth.
(Adapted from How do I recursively grep through compressed archives?)
Install AVFS, a filesystem that provides transparent access inside archives. First run this command once to set up a view of your machine's filesystem in which you can access archives as if they were directories:
mountavfs
After this, if /path/to/archive.zip
is a recognized archive, then ~/.avfs/path/to/archive.zip#
is a directory that appears to contain the contents of the archive.
find ~/.avfs"$PWD" \( -name '*.7z' -o -name '*.zip' -o -name '*.tar.gz' -o -name '*.tgz' \) \
-exec sh -c '
find "$0#" -name "*vacation*.jpg"
' {} 'Test::Version' \;
Explanations:
- Mount the AVFS filesystem.
- Look for archive files in
~/.avfs$PWD
, which is the AVFS view of the current directory. - For each archive, execute the specified shell snippet (with
$0
= archive name and$1
= pattern to search). $0#
is the directory view of the archive$0
.{\}
rather than{}
is needed in case the outerfind
substitutes{}
inside-exec ;
arguments (some do it, some don't).
Or in zsh ≥4.3:
mountavfs
ls -l ~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)(e\''
reply=($REPLY\#/**/*vacation*.jpg(.N))
'\')
Explanations:
~/.avfs$PWD/**/*.(7z|tgz|tar.gz|zip)
matches archives in the AVFS view of the current directory and its subdirectories.PATTERN(e\''CODE'\')
applies CODE to each match of PATTERN. The name of the matched file is in$REPLY
. Setting thereply
array turns the match into a list of names.$REPLY\#
is the directory view of the archive.$REPLY\#/**/*vacation*.jpg
matches*vacation*.jpg
files in the archive.- The
N
glob qualifier makes the pattern expand to an empty list if there is no match.
My usual solution:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|DESIRED_FILE_TO_SEARCH'
Example:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|characterize.txt'
Resuls are like:
foozip1.zip:
foozip2.zip:
foozip3.zip:
DESIRED_FILE_TO_SEARCH
foozip4.zip:
...
If you want only the zip file with hits on it:
find -iname '*.zip' -exec unzip -l {} \; 2>/dev/null | grep '\.zip\|FILENAME' | grep -B1 'FILENAME'
FILENAME here is used twice, so you can use a variable.
With find you might use PATH/TO/SEARCH