Bash: Filter rows by line number
Simply with awk
:
awk 'NR==FNR{ pos[$1]; next }FNR in pos' positions.txt data.txt
NR==FNR{ ... }
- processing the 1st input file (i.e.positions.txt
):pos[$1]
- accumulating positions(record numbers) set aspos
array keysnext
- jump to next record
FNR in pos
- while processing the 2nd input filedata.txt
(FNR
indicates how many records have been read from the current input file). Print record only if current record numberFNR
is in array of positionspos
(search on keys)
Sample output:
667 ffg wew 23
533 jhf qwe 54
...
First create a sed
script from the positions.txt
file:
sed 's/$/p/' positions.txt
This will output
3p
5p
8p
This simple script will just print the indicated lines.
Then apply this to the data.txt
file. If you're using bash
(or any shell that understands process substitutions with <( ... )
):
sed -n -f <( sed 's/$/p/' positions.txt ) data.txt
The -n
stops sed
from outputting anything other than what's explicitly printed by the given sed
script.
With the examples given, this will yield
667 ffg wew 23
533 jhf qwe 54
If not using bash
, then
sed 's/$/p/' positions.txt >filter.sed
sed -n -f filter.sed data.txt
rm -f filter.sed
... will do the same thing.
If positions.txt
is sorted, it's also possible to do this in a single pass through both files, and without storing positions.txt
in full. Simply read the next line off positions.txt
when the previous matching line is met:
$ awk -vpos=positions.txt 'function get() { getline num < pos }
BEGIN { get() } NR==num { print; get() }' data.txt
667 ffg wew 23
533 jhf qwe 54
In practice, this is only useful if both files are really huge or you're really, really low on memory.