Numpy loadtxt: ValueError: Wrong number of columns
Try np.genfromtxt
. It handles missing values; loadtxt
does not. Compare their docs.
Missing values can be tricky when the delimiter is white space, but with tabs it should be ok. If there still are problems, test it with a ,
delimiter.
oops - you still need the extra delimiter
eg.
a, 34,
b, 43, 34
c, 34
Both loadtxt
and genfromtxt
accept any iterable that delivers the txt line by line. So a simple thing is to readlines
, tweak the lines that have missing values and delimiters, and pass that list of lines to the loader. Or you can write this a 'filter' or generator. This approach has been described in a number of previous SO questions.
In [36]: txt=b"""a\t45\t\nb\t45\t55\nc\t66\t""".splitlines()
In [37]: txt
Out[37]: [b'a\t45\t', b'b\t45\t55', b'c\t66\t']
In [38]: np.genfromtxt(txt,delimiter='\t',dtype=str)
Out[38]:
array([['a', '45', ''],
['b', '45', '55'],
['c', '66', '']],
dtype='<U2')
I'm using Python3 so the byte strings are marked with a 'b' (for baby and me).
For strings, this is overkill; but genfromtxt
makes it easy to construct a structured array with different dtypes for each column. Note that such array is 1d, with named fields - not numbered columns.
In [50]: np.genfromtxt(txt,delimiter='\t',dtype=None)
Out[50]:
array([(b'a', 45, -1), (b'b', 45, 55), (b'c', 66, -1)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])
to pad the lines I could define a function like:
def foo(astr,delimiter=b',',cnt=3,fill=b' '):
c = astr.strip().split(delimiter)
c.extend([fill]*cnt)
return delimiter.join(c[:cnt])
and use it as:
In [85]: txt=b"""a\t45\nb\t45\t55\nc\t66""".splitlines()
In [87]: txt1=[foo(txt[0],b'\t',3,b'0') for t in txt]
In [88]: txt1
Out[88]: [b'a\t45\t0', b'a\t45\t0', b'a\t45\t0']
In [89]: np.genfromtxt(txt1,delimiter='\t',dtype=None)
Out[89]:
array([(b'a', 45, 0), (b'a', 45, 0), (b'a', 45, 0)],
dtype=[('f0', 'S1'), ('f1', '<i4'), ('f2', '<i4')])
if you have variable number of columns you can't define a proper np.array
shape.
If you want to store them in an np.array
try:
import numpy as np
a = np.loadtxt(r'TEST.txt', delimiter='\n', dtype=str)
now a
is array(['a 45', 'b 45 55', 'c 66'])
.
But in this case is better a list:
with open(r'TEST.txt') as f:
a = f.read().splitlines()
now a
is a list ['a 45', 'b 45 55', 'c 66']
If you want all rows to have the same number of columns but some have missing values you can do it easily with pandas. But you have to know the total number of columns.
import pandas as pd
pd.read_csv('foo.txt', sep='\t', names=['col_a','col_b'])