Get list of subdirectories which contain a file whose name contains a string

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort |uniq

The above finds all files below the current directory (.) that are regular files (-type f) and have f somewhere in their name (-name '*f*'). Next, sed removes the file name, leaving just the directory name. Then, the list of directories is sorted (sort) and duplicates removed (uniq).

The sed command consists of a single substitute. It looks for matches to the regular expression /[^/]+$ and replaces anything matching that with nothing. The dollar sign means the end of the line. [^/]+' means one or more characters that are not slashes. Thus, /[^/]+$ means all characters from the final slash to the end of the line. In other words, this matches the file name at the end of the full path. Thus, the sed command removes the file name, leaving unchanged the name of directory that the file was in.

Simplifications

Many modern sort commands support a -u flag which makes uniq unnecessary. For GNU sed:

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort -u

And, for MacOS sed:

find . -type f -name '*f*' | sed -E 's|/[^/]+$||' |sort -u

Also, if your find command supports it, it is possible to have find print the directory names directly. This avoids the need for sed:

find . -type f -name '*f*' -printf '%h\n' | sort -u

More robust version (Requires GNU tools)

The above versions will be confused by file names that include newlines. A more robust solution is to do the sorting on NUL-terminated strings:

find . -type f -name '*f*' -printf '%h\0' | sort -zu | sed -z 's/$/\n/'

Why not try this:

find / -name '*f*' -printf "%h\n" | sort -u

There are essentially 2 methods you can use to do this. One will parse the string while the other will operate on each file. Parsing the string use a tool such as grep, sed, or awk is obviously going to be faster but here's an example showing both, as well as how you can "profile" the 2 methods.

Sample data

For the examples below we'll use the following data

$ touch dir{1..3}/dir{100..112}/file{1..5}
$ touch dir{1..3}/dir{100..112}/nile{1..5}
$ touch dir{1..3}/dir{100..112}/knife{1..5}

Delete some of the *f* files from dir1/*:

$ rm dir1/dir10{0..2}/*f*

Approach #1 - Parsing via strings

Here we're going to use the following tools, find, grep, and sort.

$ find . -type f -name '*f*' | grep -o "\(.*\)/" | sort -u | head -5
./dir1/dir103/
./dir1/dir104/
./dir1/dir105/
./dir1/dir106/
./dir1/dir107/

Approach #2 - Parsing using files

Same tool chain as before, except this time we'll be using dirname instead of grep.

$ find . -type f -name '*f*' -exec dirname {} \; | sort -u | head -5
./dir1/dir103
./dir1/dir104
./dir1/dir105
./dir1/dir106
./dir1/dir107

NOTE: The above examples are using head -5 to merely limit the amount of output we're dealing with for these examples. They'd normally be removed to get your full listing!

Comparing the results

We can use time to take a look at the 2 approaches.

dirname

real        0m0.372s
user        0m0.028s
sys         0m0.106s

grep

real        0m0.012s
user        0m0.009s
sys         0m0.007s

So it's always best to deal with the strings if possible.

Alternative string parsing methods

grep & PCRE

$ find . -type f -name '*f*' | grep  -oP '^.*(?=/)' | sort -u

sed

$ find . -type f -name '*f*' | sed 's#/[^/]*$##' | sort -u

awk

$ find . -type f -name '*f*' | awk -F'/[^/]*$' '{print $1}' | sort -u

Get list of subdirectories which contain a file whose name contains a string

Simplifications

More robust version (Requires GNU tools)

Sample data

Approach #1 - Parsing via strings

Approach #2 - Parsing using files

Comparing the results

Alternative string parsing methods

Tags:

Command Line

Find

Related

Recent Posts