A better paste command

Assuming you don't have any tab characters in your files,

paste file1 file2 | expand -t 13

with the arg to -t suitably chosen to cover the desired max line width in file1.

OP has added a more flexible solution:

I did this so it works without the magic number 13:

Click to copy

paste file1 file2 | expand -t $(( $(wc -L <file1) + 2 ))

It's not easy to type but can be used in a script.

I thought awk might do it nicely, so I googled "awk reading input from two files" and found an article on stackoverflow to use as a starting point.

First is the condensed version, then fully commented below that. This took a more than a few minutes to work out. I'd be glad of some refinements from smarter folks.

Click to copy

awk '{if(length($0)>max)max=length($0)}
FNR==NR{s1[FNR]=$0;next}{s2[FNR]=$0}
END { format = "%-" max "s\t%-" max "s\n";
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) { printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:"" }
}' file1 file2

And here is the fully documented version of the above.

Click to copy

# 2013-11-05 mike@diehn.net
# Invoke thus:
#   awk -f this_file file1 file2
# The result is what you asked for and the columns will be
# determined by input file order.
#----------------------------------------------------------
# No matter which file we're reading,
# keep track of max line length for use
# in the printf format.
#
{ if ( length($0) > max ) max=length($0) }

# FNR is record number in current file
# NR is record number over all
# while they are equal, we're reading the first file
#   and we load the strings into array "s1"
#   and then go to the "next" line in the file we're reading.
FNR==NR { s1[FNR]=$0; next }

# and when they aren't, we're reading the
#   second file and we put the strings into
#   array s2
{s2[FNR]=$0}

# At the end, after all lines from both files have
# been read,
END {
  # use the max line length to create a printf format
  # the right widths
  format = "%-" max "s\t%-" max "s\n"
  # and figure the number of array elements we need
  # to cycle through in a for loop.
  numlines=(NR-FNR)>FNR?NR-FNR:FNR;
  for (i=1; i<=numlines; i++) {
     printf format, s1[i]?s1[i]:"", s2[i]?s2[i]:""
  }
}

On Debian and derivatives, column has a -n nomerge option that allows column to do the right thing with empty fields. Internally, column uses the wcstok(wcs, delim, ptr) function, which splits a wide character string into tokens delimited by the wide characters in the delim argument.

wcstok starts by skipping wide characters in delim, before recognizing the token. The -n option uses an algorythm that doesn't skip initial wide-characters in delim.

Unfortunately, this isn't very portable: -n is Debian-specific, and column is not in POSIX, it's apparently a BSD thing.

A better paste command

Tags:

Text Processing

Paste

Columns

Related

Recent Posts