How to process a multi column text file to get another multi column text file?
Put each field on a line and post-columnate.
Each field on one line
tr
tr -s ' ' '\n' < infile
grep
grep -o '[[:alnum:]]*' infile
sed
sed 's/\s\+/\n/g' infile
or more portable:
sed 's/\s\+/\
/g' infile
awk
awk '$1=$1' OFS='\n' infile
or
awk -v OFS='\n' '$1=$1' infile
Columnate
paste
For 2 columns:
... | paste - -
For 3 columns:
... | paste - - -
etc.
sed
For 2 columns:
... | sed 'N; s/\n/\t/g'
For 3 columns:
... | sed 'N; N; s/\n/\t/g'
etc.
xargs
... | xargs -n number-of-desired-columns
As xargs
uses /bin/echo
to print, beware that data that looks like options to echo
will be interpreted as such.
awk
... | awk '{ printf "%s", $0 (NR%n==0?ORS:OFS) }' n=number-of-desired-columns OFS='\t'
pr
... | pr -at -number-of-desired-columns
or
... | pr -at -s$'\t' -number-of-desired-columns
columns (from the autogen package)
... | columns -c number-of-desired-columns
Typical output:
a aa aaa
b bb bbb
c cc ccc
d dd ddd
e ee eee
f ff fff
g gg ggg
h hh hhh
i ii iii
j jj jjj
As Wildcard pointed out, this will only work if your file is nicely formatted, in that there aren't any special characters that the shell will interpret as globs and you are happy with the default word splitting rules. If there's any question about whether your files will "pass" that test, do not use this approach.
One possibility would be to use printf
to do it like
printf '%s\t%s\n' $(cat your_file)
That will do word splitting on the contents of your_file
and will pair them and print them with tabs in between. You could use more %s
format strings in the printf
to have extra columns.
$ sed -E 's/\s+/\n/g' ip.txt | paste - -
a aa
aaa b
bb bbb
c cc
ccc d
dd ddd
e ee
eee f
ff fff
g gg
ggg h
hh hhh
i ii
iii j
jj jjj
$ sed -E 's/\s+/\n/g' ip.txt | paste - - -
a aa aaa
b bb bbb
c cc ccc
d dd ddd
e ee eee
f ff fff
g gg ggg
h hh hhh
i ii iii
j jj jjj