Sort and merge 2 files without duplicate lines, based on the first column

Seems like you can achieve this with join very easily if the files are both sorted.

$ join -a 1 all_test.txt completed_test.txt
test1 Passed
test2
test3 Failed
test4
test5 Passed
test6 Passed

-a 1 means print lines from file 1 that had nothing joined to them.

If your files are not already sorted, you can use this (thanks terdon!):

join -a 1  <(sort all_tests.txt) <(sort completed_tests.txt )

The right tool here is join as suggested by @Zanna, but here's an awk approach:

$ awk 'NR==FNR{a[$1]=$2; next}{print $1,a[$1]}' completed_tests.txt all_tests.txt 
test1 Passed
test2 
test3 Failed
test4 
test5 Passed
test6 Passed

Perl

Effectively, this is a port of terdon's answer:

$ perl -lane '$t+=1; $h{$F[0]}=$F[1] if $.==$t; print $F[0]," ",$h{$F[0]} if $t!=$.;$.=0 if eof' completed_tests.txt all_tests.txt          
test1 Passed
test2 
test3 Failed
test4 
test5 Passed
test6 Passed

This works by building hash of test-status pairs from completed_test.txt and then looking up lines in all_tests.txt in that hash. The $t variable of total lines processed from each file and $. that is reset upon reaching end of file, allow us to keep track of which file is currently read.

Sort and merge 2 files without duplicate lines, based on the first column

Perl

Tags:

Command Line

Bash

Text Processing

Related

Recent Posts