Using AWK to Process Input from Multiple Files

awk 'FNR==NR{a[$1]=$2 FS $3;next}

here we handle the 1st input (file2). say, FS is space, we build an array(a) up, index is column1, value is column2 " " column3 the FNR==NR and next means, this part of codes work only for file2. you could man gawk check what are NR and FNR

{ print $0, a[$1]}' file2 file1

When NR != FNR it's time to process 2nd input, file1. here we print the line of file1, and take column1 as index, find out the value in array(a) print. in another word, file1 and file2 are joined by column1 in both files.

for NR and FNR, shortly,

1st input has 5 lines
2nd input has 10 lines,

NR would be 1,2,3...15
FNR would be 1...5 then 1...10

you see the trick of FNR==NR check.

I found this question/answer on Google and it appears to be referring to a very specific data set found in another question (How to merge two files using AWK?). What follows is the answer I was looking for (and that I think most people would be), i.e., simply to concatenate every line from two different files using AWK. Though you could probably use some UNIX utilities like join or paste, AWK is obviously much more flexible and powerful if your desired output is different, by using if statements, or altering the OFS (which may be more difficult to do depending on the utility; see below) for example, altering the output in a much more expressive way (an important consideration for shell scripters.)

For simple line-by-line concatenation:

awk 'FNR==NR { a[FNR""] = $0; next } { print a[FNR""], $0 }' file1 file2

This emulates the function of a numerically indexed array (AWK only has associative arrays) by using implicit type conversion. It is relatively expressive and easy to understand.

Using two files called test1 and test2 with the following lines:

test1:

line one
line two
line three

test2:

line four
line five
line six

I get this result:

line one line four
line two line five
line three line six

Depending on how you want to join the values between the columns in the output, you can pick the appropriate output field separator. Here's an example with ellipses (...) separating the columns:

awk 'BEGIN { OFS="..."} FNR==NR { a[(FNR"")] = $0; next } { print a[(FNR"")], $0 }' test1 test2

Yielding this result:

line one...line four
line two...line five
line three...line six

I hope at least that this inspires you all to take advantage of the power of AWK!

A while ago I stumbled in a very good solution to handle multiple files at once. The way is to save in memory the files in AWK arrays using the method:

FILENAME==ARGV[1] {  file2array[FNR] = $0 ; next }
FILENAME==ARGV[2] {  file1array[FNR] = $0 ; next }

For post data treatment, is better to save the number of lines, so:

FILENAME==ARGV[1] {  file2array[FNR] = $0 ; f2rows = FNR ; next }
FILENAME==ARGV[2] {  file1array[FNR] = $0 ; f1rows = FNR ; next }

f2rows and f1rows will hold the position of the last row.

It has more code, but if you want more complex data treatment, I think it's the better approach. Besides, the previous approaches treated the inputs sequentially, so if you needed to do some calculations that depended on data from both files simultaneously you wouldn't be able to do it, and with this approach you can do everything with both files.

Using AWK to Process Input from Multiple Files

Tags:

Awk

Related

Recent Posts