How to get column index of field in unix shell
Assuming your goal is to say "which column is this value in", you have a number of options, but this works:
sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e'
#output: 5:e
If you want to get just the number out of that:
sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e' | cut -d: -f1
#output: 5
Explanation:
Since the headers are on the first line of the file, we use the -n
option to tell sed
not to print out all the lines by default. We then give it an expression that starts with 1
, meaning it is only executed on the first line, and ends with p
, meaning that line gets printed out afterward.
The expression uses ANSI quotes ($'
...'
) simply so it's easier to read: you can put a newline in it with \n
instead of having to include a literal newline. Regardless, by the time the shell is done with it, the expression $'1s/,/\\\n/gp'
gets passed to sed as 1s/,/\
/gp
, which tells it to replace every comma on the first line with a newline and then print out the result. The output of just the sed on your example would be this:
a
b
c
d
e
f
g
h
(If your CSV file has many lines, you may want to add ;q
to the end of the sed
command so that it quits after that first line instead of continuing to read and do nothing with the rest of the lines.)
We then pipe that output through a grep
command looking for e
. We pass the -x
option so that it only matches lines consisting of exactly 'e', not just any line containing an 'e' (Thanks @Marcel and @Sundeep), plus the -n
option that tells it to include the line number of matching lines in its output. In the example, it outputs 5:e
, where the 5:
says that the rest of the output is from the 5th line of the input.
We can then pipe that through cut
with a field delimiter (-d
) of :
to extract just the first field (-f1
), which is the line number in the sed output - which is the field number in the original file.
- head is selecting the first line (header);
- tr is replacing the delimiter for line breaks;
- grep is selecting the line that contains exactly the string you want (substrings are ignored) and the line number is shown as well. In the example, we will have 5:e;
- cut is using ':' as delimiter and selecting the first column. So just the line number will be shown.
head -n1 abc.csv | tr "," "\n" | grep -nx e | cut -d":" -f1
File content:
a,b,c,d,e,f,g,h
String that you want:
e
Output:
5
This is a bit of hack, but it will give you the index of e
:
head -n1 abc.csv | grep -oE '^.*(,|^)e(,|$)' | tr -Cd , | wc -c
It works by extracting the part of the top row up to the e
, then it removes all characters except for the commas, and finally it counts the number of commas.