Sort and merge 2 files without duplicate lines, based on the first column
Seems like you can achieve this with join
very easily if the files are both sorted.
$ join -a 1 all_test.txt completed_test.txt
test1 Passed
test2
test3 Failed
test4
test5 Passed
test6 Passed
-a 1
means print lines from file 1 that had nothing joined to them.
If your files are not already sorted, you can use this (thanks terdon!):
join -a 1 <(sort all_tests.txt) <(sort completed_tests.txt )
The right tool here is join
as suggested by @Zanna, but here's an awk
approach:
$ awk 'NR==FNR{a[$1]=$2; next}{print $1,a[$1]}' completed_tests.txt all_tests.txt
test1 Passed
test2
test3 Failed
test4
test5 Passed
test6 Passed
Perl
Effectively, this is a port of terdon's answer:
$ perl -lane '$t+=1; $h{$F[0]}=$F[1] if $.==$t; print $F[0]," ",$h{$F[0]} if $t!=$.;$.=0 if eof' completed_tests.txt all_tests.txt
test1 Passed
test2
test3 Failed
test4
test5 Passed
test6 Passed
This works by building hash of test-status pairs from completed_test.txt
and then looking up lines in all_tests.txt
in that hash. The $t
variable of total lines processed from each file and $.
that is reset upon reaching end of file, allow us to keep track of which file is currently read.