grep output from long to wide
Using GNU datamash
:
$ grep -n -x -F -f fileA.txt fileB.txt | datamash -s -t : -g 2 collapse 1
Germany:4,9
UK:5,6
USA:1,2,11
This first uses grep
to get the lines from fileB.txt
that exactly matches the lines in fileA.txt
, and outputs the matching line numbers along with the lines themselves.
I'm using -x
and -F
in addition to the options that are used in the question. I do this to avoid reading the patterns from fileA.txt
as regular expressions (-F
), and to match complete lines, not substrings (-x
).
The datamash
utility is then parsing this as lines of :
-delimited fields (-t :
), sorting it (-s
) on the second field (-g 2
; the countries) and collapsing the first field (collapse 1
; the line numbers) into a list for each country.
You could then obviously replace the colons and commas with tabs using tr ':,' '\t\t'
, or with spaces in a similar way.
$ grep -n -x -f fileA.txt -F fileB.txt | datamash -s -t : -g 2 collapse 1 | tr ':,' '\t\t'
Germany 4 9
UK 5 6
USA 1 2 11
Use awk
:
awk 'NR==FNR { country[$0]= country[$0]? country[$0] FS NR: NR; next }
($0 in country){ print $0, country[$0] }' fileB fileA
or to report "count: 0" in case there was a countryName in fileA that doesn't appear in fileB, do:
awk 'NR==FNR { country[$0]= country[$0]? country[$0] FS NR: NR; next }
($0 in country){ print $0, country[$0]; next } { print $0, "0" }' fileB fileA
You could couple your grep command output with Miller (https://github.com/johnkerl/miller) and run
grep -nf fileA.txt fileB.txt | \
mlr --c2n --ifs ":" --implicit-csv-header --headerless-csv-output reorder -f 2 then \
nest --implode --values --across-records --nested-fs " " -f 1
You will have
Germany 4 9
USA 1 2 11
UK 5 6