grep output from long to wide

Using GNU datamash:

$ grep -n -x -F -f fileA.txt fileB.txt | datamash -s -t : -g 2 collapse 1
Germany:4,9
UK:5,6
USA:1,2,11

This first uses grep to get the lines from fileB.txt that exactly matches the lines in fileA.txt, and outputs the matching line numbers along with the lines themselves.

I'm using -x and -F in addition to the options that are used in the question. I do this to avoid reading the patterns from fileA.txt as regular expressions (-F), and to match complete lines, not substrings (-x).

The datamash utility is then parsing this as lines of :-delimited fields (-t :), sorting it (-s) on the second field (-g 2; the countries) and collapsing the first field (collapse 1; the line numbers) into a list for each country.

You could then obviously replace the colons and commas with tabs using tr ':,' '\t\t', or with spaces in a similar way.

$ grep -n -x -f fileA.txt -F fileB.txt | datamash -s -t : -g 2 collapse 1 | tr ':,' '\t\t'
Germany 4       9
UK      5       6
USA     1       2       11

Use awk:

awk 'NR==FNR        { country[$0]= country[$0]? country[$0] FS NR: NR; next }
     ($0 in country){ print $0, country[$0] }' fileB fileA

or to report "count: 0" in case there was a countryName in fileA that doesn't appear in fileB, do:

awk 'NR==FNR        { country[$0]= country[$0]? country[$0] FS NR: NR; next }
     ($0 in country){ print $0, country[$0]; next } { print $0, "0" }' fileB fileA

You could couple your grep command output with Miller (https://github.com/johnkerl/miller) and run

grep -nf fileA.txt fileB.txt | \
mlr --c2n --ifs ":" --implicit-csv-header --headerless-csv-output reorder -f 2  then \
nest --implode --values --across-records --nested-fs " " -f 1

You will have

Germany 4 9
USA 1 2 11
UK 5 6

grep output from long to wide

Tags:

Linux

Grep

Awk

Text Processing

Related

Recent Posts