Could sed or awk use NUL character as record separator?
By default, the record separator is the newline character, defining a record to be a single line of text. You can use a different character by changing the built-in variable RS. The value of RS is a string that says how to separate records; the default value is \n
, the string containing just a newline character.
awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
Since version 4.2.2, GNU sed
has the -z
or --null-data
option to do exactly this. Eg:
sed -z 's/old/new' null_separated_infile
Yes, gawk
can do this, set the record separator to \0
. For example the command
gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ
Will print out the value of the LD_PRELOAD
variable:
/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
The /proc/$PID/environ
file is a NUL
separated list of environment variables. I'm using it as an example, as it's easy to try on a linux system.
The BEGIN
part sets the record separator to \0
and the field separator to =
because I also want to extract the part after =
based on the part before =
.
The $1=="LD_PRELOAD"
runs the block if the first field has the key I'm interested in.
The print $2
block prints out the string after =
.
But mawk
cannot parse input files separated with NUL
. This is documented in man mawk
:
BUGS
mawk cannot handle ascii NUL \0 in the source or data files.
mawk
will stop reading the input after the first \0
character.
You can also use xargs
to handle NUL
separated input, a bit non-intuitively, like this:
xargs -0 -n1 </proc/$$/environ
xargs
is using echo
as the default comand.
-0
sets the input to be NUL
separated.
-n1
sets the max arguments to echo
to be 1, this way the output will be separated by newlines.
And as Graeme's answer shows, sed
can do this too.
Using sed
for removing the null
characters -
sed 's/\x0/ /g' infile > outfile
or make in-file substitution by doing (this will make backup of your original file and overwrite your original file with substitutions).
sed -i.bak 's/\x0/ /g' infile
Using tr
:
tr -d "\000" < infile > outfile