How to print certain columns by name?
Maybe something like this:
$ cat t.awk
NR==1 {
for (i=1; i<=NF; i++) {
ix[$i] = i
}
}
NR>1 {
print $ix[c1], $ix[c2]
}
$ awk -f t.awk c1=id c2=name input
1 ed
2 joe
$ awk -f t.awk c1=age c2=name input
50 ed
70 joe
If you want to specify the columns to print on the command line, you could do something like this:
$ cat t.awk
BEGIN {
split(cols,out,",")
}
NR==1 {
for (i=1; i<=NF; i++)
ix[$i] = i
}
NR>1 {
for(i=1; i <= length(out); i++)
printf "%s%s", $ix[out[i]], OFS
print ""
}
$ awk -f t.awk -v cols=name,age,id,name,id input
ed 1 ed 50 1
joe 2 joe 70 2
(Note the -v
switch to get the variable defined in the BEGIN
block.)
csvkit
Convert the input data to a csv format and use a csv tool such as csvcut
from the csvkit
:
$ cat test-cols.dat
id name age
1 ed 50
2 joe 70
Install csvkit:
$ pip install csvkit
Use tr
with its squeeze option -s
to convert it into a valid csv file and apply csvcut
:
$ cat test-cols.dat | tr -s ' ' ',' | csvcut -c id,age
id,age
1,50
2,70
If you want to return to the old data format, you can use tr ',' ' ' | column -t
$ cat test-cols.dat | tr -s ' ' ',' | csvcut -c id,age | tr ',' ' ' | column -t
id age
1 50
2 70
Notes
csvkit supports also different delimiters (shared option
-d
or--delimiter
), but returns a csv file:If the file uses only spaces to separate columns (no tabs at all), following works
$ csvcut -d ' ' -S -c 'id,age' test-cols.dat id,age 1,50 2,70
If the file uses a tab to separate columns, following works and
csvformat
can be used to get back tsv file:$ csvcut -t -c 'id,age' test-cols.dat | csvformat -T id age 1 50 2 70
As far as I have checked, only a single tab is allowed.
csvlook
can format the table in a markdown table format$ csvcut -t -c "id,age" test-cols.dat | csvlook | id | age | | -- | --- | | 1 | 50 | | 2 | 70 |
UUOC (Useless Use Of Cat): I like it this way to construct the command.
Just trowing a Perl solution into the lot:
#!/usr/bin/perl -wnla
BEGIN {
@f = ('id', 'age'); # field names to print
print "@f"; # print field names
}
if ($. == 1) { # if line number 1
@n = @F; # get all field names
} else { # or else
@v{@n} = @F; # map field names to values
print "@v{@f}"; # print values based on names
}