numpy, named columns
I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.
NumPy structured arrays have named columns:
import numpy as np
a = range(100)
A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')])
print(A.dtype)
[('C1', '<i4'), ('C2', '<i8')]
You can access the columns by name like this:
print(A['C1'])
# [ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
# 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]
Note that using np.array
with zip
causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip
.
Instead, given a NumPy array A
, you could use ravel()
to make A
a 1D
array, and then use view
to turn it into a structured array, and then use astype
to convert the columns to the desired type:
a = range(100)
A = np.array(a).reshape( len(a)//2, 2)
A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),])
print(A[:5])
# array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)],
# dtype=[('col1', '<i4'), ('col2', '<i8')])
print(A.dtype)
# dtype([('col1', '<i4'), ('col2', '<i8')])