Concatenate multiple files with same header
Another solution, similar to "cat+grep
" from above, using tail
and head
:
Write the header of the first file into the output:
head -2 file1.txt > all.txt
--
head -2
gets 2 first lines of the file.Add the content of all the files:
tail -n +3 -q file*.txt >> all.txt
--
-n +3
makestail
print lines from 3rd to the end,-q
tells it not to print the header with the file name (readman
),>>
adds to the file, not overwrites it as>
.
And sure you can put both commands in one line:
head -2 file1.txt > all.txt; tail -n +3 -q file*.txt >> all.txt
or instead of ;
put &&
between them for success check.
If you know how to do it in R, then by all means do it in R. With classical unix tools, this is most naturally done in awk.
awk '
FNR==1 && NR!=1 { while (/^<header>/) getline; }
1 {print}
' file*.txt >all.txt
The first line of the awk script matches the first line of a file (FNR==1
) except if it's also the first line across all files (NR==1
). When these conditions are met, the expression while (/^<header>/) getline;
is executed, which causes awk to keep reading another line (skipping the current one) as long as the current one matches the regexp ^<header>
. The second line of the awk script prints everything except for the lines that were previously skipped.
Try doing this :
$ cat file1.txt; grep -v "^<header" file2.txt
<header>INFO=<ID=DP,Number=1,Type=Integer>
<header>INFO=<ID=DP4,Number=4,Type=Integer>
A
B
C
D
E
F
NOTE
- the
-v
flag means to invert the match of grep ^
in REGEX, means beginning of the string- if you have a bunch of files, you can do
:
array=( files*.txt )
{ cat ${array[@]:0:1}; grep -v "^<header" ${array[@]:1}; } > new_file.txt
It's a bash array slicing technique.