How can I merge the lines of two files by having common headers?
Awk
solution:
awk '/^>/{ k=$1 FS $2 }
NR==FNR{
if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0; next
}
k in a{
print $0 ORS a[k]; delete a[k]; next
}1' file1 file2
/^>/{ k=$1 FS $2 }
- on encountering header line(i.e.>Feature ...
) - compose a keyk
from the 1st$1
and 2nd$2
fieldsNR==FNR{ ... }
- processing the 1st input file (file1
):if (!/^>/) a[k]=(a[k]!="")? a[k] ORS $0: $0
- accumulate non-header lines into arraya
using current keyk
next
- jump to next record
k in a
- if current key based onfile2
record is in arraya
(based onfile1
records):print $0 ORS a[k]
- print related recordsdelete a[k]
- delete processed item(s)
The output:
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 300 r
>Feature scaffold3
10 500 g
100 200 r
>Feature scaffold4
10 300 g
500 600 r
>Feature scaffold5
1 1000 r
Another approach and to make it simpler.
grep -v '^scaffold' <(awk -v RS='>Feature ' \
'NF{s[$1]=s[$1]$0} END{for (x in s)print RS""s[x]}' file[12])