Substitute every comma outside of double quotes for a pipe
Using csvkit
:
$ csvformat -D '|' file.csv
John|Tonny|345.3435,23|56th Street
The tools in csvkit knows how to handle the intricacies of CVS files, and here we're using csvformat
to replace the delimiting commas with |
correctly. The output fields will be quoted as needed.
Example:
$ cat file.csv
John,Tonny,"345.3435,23",56th Street
The | factory,Ltd.,"0,0",meep meep
$ csvformat -D '|' file.csv
John|Tonny|345.3435,23|56th Street
"The | factory"|Ltd.|0,0|meep meep
If your sed
supports the -E
option (-r
in some implementations):
sed -Ee :1 -e 's/^(([^",]|"[^"]*")*),/\1|/;t1' < file
The
:label
s/pattern/replacement/
t label
Is a very common sed
idiom. It keeps doing the same substitution in a loop as long as it's successful.
Here, we're substituting the leading part of the line made of 0 or more quoted strings or characters other that "
and ,
(captured in \1
) followed by a ,
with that \1
capture and a |
, so on your sample that means:
John,Tonny,"345.3435,23",56th Street
->John|Tonny,"345.3435,23",56th Street
John|Tonny,"345.3435,23",56th Street
->John|Tonny|"345.3435,23",56th Street
John|Tonny|"345.3435,23",56th Street
->John|Tonny|"345.3435,23"|56th Street
- and we stop here as the pattern doesn't match any more on that.
With perl
, you could do it with one substitution with the g
flag with:
perl -pe 's{("[^"]*"|[^",]+)|,}{$1 // "|"}ge'
Here, assuming quotes are balanced in the input, the pattern would match all the input, breaking it up in either:
- quoted string
- sequences of characters other than
,
or"
- a comma
And only when the matched string is a comma (when $1
is not defined in the replacement part), replace it with a |
.
With perl
perl -MText::CSV -lne '
BEGIN { $p = Text::CSV->new() }
print join "|", $p->fields() if $p->parse($_)
' file.csv
John|Tonny|345.3435,23|56th Street