Remove comma between the quotes only in a comma delimited file
If the quotes are balanced, you will want to remove commas between every other quote, this can be expressed in awk
like this:
awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' infile
Output:
123,ABC DEV 23,345,534.202,NAME
Explanation
The -F"
makes awk separate the line at the double-quote signs, which means every other field will be the inter-quote text. The for-loop runs gsub
, short for globally substitute, on every other field, replacing comma (","
) with nothing (""
). The 1
at the end invokes the default code-block: { print $0 }
.
There is a good response, using sed simply one time with a loop:
echo '123,"ABC, DEV 23",345,534,"some more, comma-separated, words",202,NAME'|
sed ':a;s/^\(\([^"]*,\?\|"[^",]*",\?\)*"[^",]*\),/\1 /;ta'
123,"ABC DEV 23",345,534,"some more comma-separated words",202,NAME
Explanation:
:a;
is a label for furter branchs/^\(\([^"]*,\?\|"[^",]*",\?\)*"[^",]*\),/\1 /
could contain 3 enclosed parts- first the 2nd:
[^"]*,\?\|"[^",]*",\?
match for a string containing no double quote, maybe followed by a coma or a string enclosed by two double quote, without coma and maybe followed by a coma. - than the first RE part is composed by as many repetition of previously described part 2, followed by 1 double quote and some caracteres, but no double-quote, nor comas.
- The first RE part as to be followed by a coma.
- Nota, the rest of the line don't need to be touched
- first the 2nd:
ta
will loop to:a
if previouss/
command did some change.
Once loop done, you could even add s/ */ /g
:
echo '123,"ABC, DEV 23",345,534,"some more, comma-separated, words",202,NAME'|
sed ':a;s/^\(\([^"]*,\?\|"[^",]*",\?\)*"[^",]*\),/\1 /;ta;s/ */ /g'
will suppress double spaces:
123,"ABC DEV 23",345,534,"some more comma-separated words",202,NAME
A general solution that can also handle several commas between balanced quotes needs a nested substitution. I implement a solution in perl, which process every line of a given input and only substitute commas in every other pair of quotes:
perl -pe 's/ " (.+? [^\\]) " # find all non escaped
# quoting pairs
# in a non-greedy way
/ ($ret = $1) =~ (s#,##g); # remove all commas within quotes
$ret # substitute the substitution :)
/gex'
or in short
perl -pe 's/"(.+?[^\\])"/($ret = $1) =~ (s#,##g); $ret/ge'
You can either pipe the text you want to process to the command or specify the textfile to be processed as last command line argument.