how to ask cat (and maybe grep ?) not to take into account a new line when it's inside double quotes?
Using csvgrep
from the csvkit
package to pull out all records that has a codeRegion
value containing the string 01
:
csvgrep -c codeRegion -m 01 file.csv
This is using a proper CSV parser, so there will no issues with newlines or commas in properly quoted fields.
The -c
option selects the column that we'd like to investigate, by number or by name, and -m
designates the string to match with. One could also use -r
to match with a regular expression, e.g. -r '^01$'
to avoid matching strings where 01
is a substring (as in 011
). See csvgrep --help
.
awk '/^01/||n%2{print;n+=gsub(/"/,"&")}' file
For each line,
/^01/||n%2
If line begins with01
orn
(initally zero) is odd,print
Print itn+=gsub(/"/,"&")
incrementn
by the return value of thegsub
function.
This replaces every double-quote/"/
with itself"&"
. That would be pointless, indeed, but it also returns the number of substitutions made, so it is a way of counting the number of double-quotes in the line.
Notice that if the n
is odd (n%2
) the line does not have a closing double-quote, so it keeps printing until n
is even, regardless of whether there is a /^01/
match on the next lines.
A side-by-side diff for you:
$ diff -yW 30 <(cat file) <(awk '/^01/||n%2{print;n+=gsub(/"/,"&")}' file)
04,xde <
01,abc" 01,abc"
cd cd
as" as"
02,dsad <
03,1ad" <
01,as,"as 01,as,"as
us" us"
02,s <
01,a 01,a