Bash: Filter rows by line number

Simply with awk:

awk 'NR==FNR{ pos[$1]; next }FNR in pos' positions.txt data.txt
  • NR==FNR{ ... } - processing the 1st input file (i.e. positions.txt):
    • pos[$1] - accumulating positions(record numbers) set as pos array keys
    • next - jump to next record
  • FNR in pos - while processing the 2nd input file data.txt(FNR indicates how many records have been read from the current input file). Print record only if current record number FNR is in array of positions pos (search on keys)

Sample output:

667 ffg wew  23
533 jhf qwe  54
...

First create a sed script from the positions.txt file:

sed 's/$/p/' positions.txt

This will output

3p
5p
8p

This simple script will just print the indicated lines.

Then apply this to the data.txt file. If you're using bash (or any shell that understands process substitutions with <( ... )):

sed -n -f <( sed 's/$/p/' positions.txt ) data.txt

The -n stops sed from outputting anything other than what's explicitly printed by the given sed script.

With the examples given, this will yield

667 ffg wew  23
533 jhf qwe  54

If not using bash, then

sed 's/$/p/' positions.txt >filter.sed
sed -n -f filter.sed data.txt
rm -f filter.sed

... will do the same thing.


If positions.txt is sorted, it's also possible to do this in a single pass through both files, and without storing positions.txt in full. Simply read the next line off positions.txt when the previous matching line is met:

$ awk -vpos=positions.txt 'function get() { getline num < pos } 
     BEGIN { get() } NR==num { print; get() }' data.txt                 
667 ffg wew  23
533 jhf qwe  54

In practice, this is only useful if both files are really huge or you're really, really low on memory.