A better paste command
Assuming you don't have any tab characters in your files,
paste file1 file2 | expand -t 13
with the arg to -t
suitably chosen to cover the desired max line width in file1.
OP has added a more flexible solution:
I did this so it works without the magic number 13:
paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))
It's not easy to type but can be used in a script.
I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.
First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.
awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
numlines=(NR-FNR)>FNR?NR-FNR:FNR;
for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2
And here is the fully documented version of the above.
# 2013-11-05 [email protected]
# Invoke thus:
# awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }
# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
# and we load the strings into array "s1"
# and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }
# and when they aren't, we're reading the
# second file and we put the strings into
# array s2
{s2[FNR]=$0}
# At the end, after all lines from both files have
# been read,
END {
# use the max line length to create a printf format
# the right widths
format = "%-" max "s\t%-" max "s\n"
# and figure the number of array elements we need
# to cycle through in a for loop.
numlines=(NR-FNR)>FNR?NR-FNR:FNR;
for (i=1; i<=numlines; i++) {
printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
}
}
On Debian and derivatives, column
has a -n
nomerge option that allows column to do the right thing with empty fields. Internally, column
uses the wcstok(wcs, delim, ptr)
function, which splits a wide character string into tokens delimited by the wide characters in the delim
argument.
wcstok
starts by skipping wide characters in delim
, before recognizing the token. The -n
option uses an algorythm that doesn't skip initial wide-characters in delim
.
Unfortunately, this isn't very portable: -n
is Debian-specific, and column
is not in POSIX, it's apparently a BSD thing.