How can I blank the nth to mth field using the awk command?
To scalably blank all fields from the nth to the mth in an awk
command, you shouldn't hardcode the values; you should use a "for" loop:
awk 'BEGIN { FS = ","; OFS = ","} {for (i = 3; i <= 4; i++) { $i = "" }; print}' inputfile
If you want to blank out a different range, adjust the values "3" and "4" in the above code.
Explanation:
The BEGIN { ... }
block is processed before looking at any of the lines of the file.
OFS
sets the output field separator, and FS
sets the field separator for input. We want them both to be commas.
The for
loop is just like C syntax. In this case it performs the following { code block }
for i
as 3 and as 4.
The $i
deserves mention because it is entirely unlike shell syntax. In shell scripting, the name of a variable must be prefixed with $
to expand to the value of the variable. Not so in awk
. In awk
, i
by itself expands to its value—3 or 4 in this case—and the $
followed by a number means the field in that numbered position. So $i = ""
sets the i
th field to a blank string.
Then the print
command, given without arguments, defaults to printing the entire line. Actually it takes all the fields of the line as delimited by FS
, and as modified by any previous commands, and prints them all, separated by OFS
and followed by a newline at the end.
An equivalent shorter command:
I feel that the above command is the cleanest and most easily extensible if you are going to include it in a script. It is very explicit about what it is doing and very readable. Plus, the entire thing can be broken out to a standalone awk
script without change; something that can't be done automatically when using -v
and -F
switches to your awk
invocation. (That's no reason not to use them, of course. Just something to be aware of.)
For a one-off usage especially, I would use the following:
awk -F, -v OFS=, '{for (i = 3; i <= 4; i++) { $i = "" }; print}' inputfile
The -F
switch sets the value of FS
. The -v
switch allows you to set values of awk
variables on the command line.
On a more general note, the -v
switch can be extremely useful for passing shell variables in as awk variables: -v myawkvar="$myshellvar"
and for changing the runtime behavior of a standalone awk
script that you pull from a script file with the -f scriptname
option at the command line.
</path/to/in_file awk -v 'FS=,' -v 'OFS=,' '{$3=$4=""; print}'
Explanation
</path/to/in_file
: read file to standard in.-v 'FS=,' -v 'OFS=,'
: set file separators and output file separator to,
.'{$3=$4=""; print}'
: set 3rd and 4th fields to blank, then print entire line (shorted form courtesy of jasonwryan).
sed 's/\([^,]*,\)\{2\}/,,/2' <in >out
U,N,,,A,5
N,P,,,B,6
I,M,,,C,7
X,Y,,,D,8
P,R,,,E,9
That replaces the second occurrence of a group of two consecutive comma-delimited fields with two commas.
You could also do it like:
sed 's/[^,]*//4;s///3' <in >out
...which replaces the the 4th and 3rd occurrence of a sequence of any num not-comma characters with nothing.
To do it as @Wildcard did - with a scalable loop:
sed -e:t -e'/\n\{2\}/!s/\(\n*\)[^,]*./\n\1/3;/\n$/!tt' -e's///;y/\n/,/'
...or...
sed -e:t -e's/\n$//;s/\n/&/2;to' \
-e's/\(\n*\)[^,]*./\1\n/3;tt' \
-e:o -ey/\\n/,/
...where 3
is the field number you would start blanking, ,
is the delimeter, and 2
is the number of fields you would blank all told.
either way you write it...
sed "$script" <<""
U
N,P
I,M,UNIX
X,Y,BASH,333
P,R,SCRIPT,444,E,9
U
N,P
I,M,
X,Y,,
P,R,,,E,9
...though you may need to use a literal newline in place of n
in ... /\1\n/3
.