Filter 500 files with awk, then cat results to single file
Your code overwrites the output file in each iteration. You also do not actually call awk
.
What you want to do is something like
awk '$5 >= 0.5' ./*.imputed.*_info >snplist.txt
This would call awk
with all your files at once, and it would go through them one by one, in the order that the shell expands the globbing pattern. If the 5th column of any line in a file is greater or equal to 0.5, that line would be outputted (into snplist.txt
). This works since the default action, if no action ({...}
block) is associated with a condition, is to output the current line.
In cases where you have a large number of files (many thousands), this may generate an "Argument list too long" error. In that case, you may want to loop:
for filename in ./*.imputed.*_info; do
awk '$5 >= 0.5' "$filename"
done >snplist.txt
Note that the result of awk
does not need to be stored in a variable. Here, it's just outputted and the loop (and therefore all commands inside the loop) is redirected into snplist.txt
.
For many thousands of files, this would be quite slow since awk
would need to be invoked for each of them individually.
To speed things up, in the cases where you have too many files for a single invocation of awk
, you may consider using xargs
like so:
printf '%s\0' ./*.imputed.*_info | xargs -0 awk '$5 >= 0.5' >snplist.txt
This would create a list of filenames with printf
and pass them off to xargs
as a nul-terminated list. The xargs
utility would take these and start awk
with as many of them as possible at once, in batches. The output of the whole pipeline would be redirected to snplist.txt
.
This xargs
alternative is assuming that you are using a Unix, like Linux, which has an xargs
command that implements the non-standard -0
option to read nul-terminated input. It also assumes that you are using a shell, like bash
, that has a built-in printf
utility (ksh
, the default shell on OpenBSD, would not work here as it has no such built-in utility).
For the zsh
shell (i.e. not bash
):
autoload -U zargs
zargs -- ./*.imputed.*_info -- awk '$5 >= 0.5' >snplist.txt
This uses zargs
, which is basically a reimplementation of xargs
as a loadable zsh
shell function. See zargs --help
(after loading the function) and the zshcontrib(1)
manual for further information about that.
Just do this :
awk '$5 >= .5' *.imputed.*_info > snplist.txt
I have a habit of using find
for this kind of thing.
find . -type f -name "*.imputed.*_info" -exec awk '$5 >= 0.5' {} \; > ./snplist.txt