Count files in directory with specific string on name?
Do you mean you want to search for snp
in the file names? That would be a simple shell glob (wildcard), used like this:
ls -dq *snp* | wc -l
Omit the -q
flag if your version of ls
doesn't recognise it. It handles filenames containing "strange" characters (including newlines).
If you stand quietly in the hallways of Unix&Linux and listen carefully, you’ll hear a ghostly voice, pitifully wailing, “What about filenames that contain newlines?”
ls -d *snp* | wc -l
or, equivalently,
printf "%s\n" *snp* | wc -l
will output all the filenames that contain snp
,
each followed by a newline,
but also including any newlines in the filenames,
and then count the number of lines in the output.
If there is a file whose name is
f o o s n p \n b a r . t s v
then that name will be written out as
foosnp
bar.tsv
which, of course, will be counted as two lines.
There are a few alternatives that do better in at least some cases:
printf "%s\n" * | grep -c snp
which counts the lines that contain snp
,
so the foosnp(\n)bar.tsv
example from above counts only once.
A slight variation on this is
ls -f | grep -c snp
The above two commands differ in that:
- The
ls -f
will include files whose names begin with.
; theprintf … *
does not, unless thedotglob
shell option is set. printf
is a shell builtin;ls
is an external command. Therefore, thels
might use slightly more resources.- When the shell processes a
*
, it sorts the filenames;ls -f
does not sort the filenames. Therefore, thels
might use slightly less resources.
But they have something in common:
they will both give wrong results in the presence of filenames
that contain newline and have snp
both before and after the newline.
Another:
filenamelist=(*snp*)
echo ${#filenamelist[@]}
This creates a shell array variable listing all the filenames that
contain snp
, and then reports the number of elements in the array.
The filenames are treated as strings, not lines,
so embedded newlines are not an issue.
It is conceivable that this approach could have a problem
if the directory is huge,
because the list of filenames must be held in shell memory.
Yet another:
Earlier, when we said printf "%s\n" *snp*
,
the printf
command repeated (reused) the "%s\n"
format string
once for each argument in the expansion of *snp*
.
Here, we make a small change in that:
printf "%.0s\n" *snp* | wc -l
This will repeat (reuse) the "%.0s\n"
format string
once for each argument in the expansion of *snp*
.
But "%.0s"
means to print the first zero characters of each string —
i.e., nothing.
This printf
command will output only a newline (i.e., a blank line)
for each file that contains snp
in its name;
and then wc -l
will count them.
And, again, you can include the .
files by setting dotglob
.
Abstract:
Works for files with "odd" names (including new lines).
set -- *snp* ; echo "$#" # change positional arguments
count=$(printf 'x%.0s' *snp*); echo "${#count}" # most shells
printf -v count 'x%.0s' *snp*; echo "${#count}" # bash
Description
As a simple glob will match every filename with snp
in its name a simple echo *snp*
could be enough for this case, but to really show that there are only three files matching I'll use:
$ ls -Q *snp*
"Codigo-0275_tdim.snps.tsv" "foo * bar\tsnp baz.tsv" "S134_tdim.snps.tsv"
The only issue remaining is to count the files. Yes, grep is an usual solution, and yes counting new lines with wc -l
is also an usual solution. Note that grep -c
(count) really counts how many times a snp
string is matched, and, if one file name has more than one snp
string in the name, the count will be incorrect.
We can do better.
One simple solution is to set the positional arguments:
$ set -- *snp*
$ echo "$#"
3
To avoid changing the positional arguments we can transform each argument to one character and print the length of the resulting string (for most shells):
$ printf 'x%.0s' *snp*
xxx
$ count=$(printf 'x%.0s' *snp*); echo "${#count}"
3
Or, in bash, to avoid a subshell:
$ printf -v count 'x%.0s' *snp*; echo "${#count}"
3
File list
List of files (from the original question with one with an newline added):
a='
Codigo-0275_tdim.matches.tsv
Codigo-0275_tdim.snps.tsv
FloragenexTdim_haplotypes_SNp3filter17_single.tsv
FloragenexTdim_haplotypes_SNp3filter17.tsv
FloragenexTdim_SNP3Filter17.fas
S134_tdim.alleles.tsv
S134_tdim.snps.tsv
S134_tdim.tags.tsv'
$ touch $a
touch $'foosnp\nbar.tsv'
That will have a file with one newline in the middle:
f o o s n p \n b a r . t s v
And to test glob expansion:
$ touch $'foo * bar\tsnp baz.tsv'
That will add an asterisk, that, if unquoted, will expand to the whole list of files.