"Line contains NULL byte" in CSV reader (Python)
You could just inline a generator to filter out the null values if you want to pretend they don't exist. Of course this is assuming the null bytes are not really part of the encoding and really are some kind of erroneous artifact or bug.
See the (line.replace('\0','') for line in f)
below, also you'll want to probably open that file up using mode rb
.
import csv
lines = []
with open('output.txt','r') as f:
for line in f.readlines():
lines.append(line[:-1])
with open('corrected.csv','w') as correct:
writer = csv.writer(correct, dialect = 'excel')
with open('input.csv', 'rb') as mycsv:
reader = csv.reader( (line.replace('\0','') for line in mycsv) )
for row in reader:
if row[0] not in lines:
writer.writerow(row)
I've solved a similar problem with an easier solution:
import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))
The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.
If you want to replace the nulls with something you can do this:
def fix_nulls(s):
for line in s:
yield line.replace('\0', ' ')
r = csv.reader(fix_nulls(open(...)))
I'm guessing you have a NUL byte in input.csv. You can test that with
if '\0' in open('input.csv').read():
print "you have null bytes in your input file"
else:
print "you don't"
if you do,
reader = csv.reader(x.replace('\0', '') for x in mycsv)
may get you around that. Or it may indicate you have utf16 or something 'interesting' in the .csv file.