fastest way convert tab-delimited file to csv in linux
If all you need to do is translate all tab characters to comma characters, tr
is probably the way to go.
The blank space here is a literal tab:
$ echo "hello world" | tr "\\t" ","
hello,world
Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.
perl -lpe 's/"/""/g; s/^|$/"/g; s/\t/","/g' < input.tab > output.csv
Perl is generally faster at this sort of thing than the sed, awk, and Python.
If you're worried about embedded commas then you'll need to use a slightly more intelligent method. Here's a Python script that takes TSV lines from stdin and writes CSV lines to stdout:
import sys
import csv
tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tabin:
commaout.writerow(row)
Run it from a shell as follows:
python script.py < input.tsv > output.csv
If you want to convert the whole tsv file into a csv file:
$ cat data.tsv | tr "\\t" "," > data.csv
If you want to omit some fields:
$ cat data.tsv | cut -f1,2,3 | tr "\\t" "," > data.csv
The above command will convert the data.tsv file to data.csv file containing only the first three fields.