Creating a structured array from a list
np.array() function accepts list of list as input. So if you want to create a 2 * 2 matrix, for example, this is what you need to do
X = np.array([[1,2], [3,4]])
Details of how np.array
handles various inputs are buried in compiled code. As the many questions about creating object dtype arrays show, it can be complicated and confusing. The basic model is to create multidimensional numeric array from a nested list.
np.array([[1,2,3],[4,5,6]])
In implementing structured arrays, developers adopted the tuple
as a way of distinguishing a record from just another nested dimension. That is evident in the display of a structured array.
It is also a requirement when defining a structured array, though the list of tuples
requirement is somewhat buried in the documentation.
In [382]: dt=np.dtype([('y',int)])
In [383]: np.array(alist,dt)
TypeError: a bytes-like object is required, not 'int'
This is my version '1.12.0' error message. It appears to be different in yours.
As you note a list comprehension can convert the nest list into a list of tuples.
In [384]: np.array([tuple(i) for i in alist],dt)
Out[384]:
array([(1,), (2,), (3,)],
dtype=[('y', '<i4')])
In answering SO questions that's the approach I use most often. Either that or iteratively set fields of a preallocated array (usually there are a lot more records than fields, so that loop is not expensive).
It looks like wrapping the array in an structured array call is equivalent to an astype
call:
In [385]: np.array(np.array(alist),dt)
Out[385]:
array([[(1,)],
[(2,)],
[(3,)]],
dtype=[('y', '<i4')])
In [386]: np.array(alist).astype(dt)
Out[386]:
array([[(1,)],
[(2,)],
[(3,)]],
dtype=[('y', '<i4')])
But note the change in the number of dimensions. The list of tuples created a (3,) array. The astype
converted a (3,1)
numeric array into a (3,1) structured array.
Part of what the tuples tell np.array
is - put the division between array dimensions and records 'here'. It interprets
[(3,), (1,), (2,)]
[record, record, record]
where as automatic translation of [[1],[2],[3]]
might produce
[[record],[record],[record]]
When the dtype is numeric (non-structured) it ignores the distinction between list and tuple
In [388]: np.array([tuple(i) for i in alist],int)
Out[388]:
array([[1],
[2],
[3]])
But when the dtype is compound, developers have chosen to use the tuple layer as significant information.
Consider a more complex structured dtype
In [389]: dt1=np.dtype([('y',int,(2,))])
In [390]: np.ones((3,), dt1)
Out[390]:
array([([1, 1],), ([1, 1],), ([1, 1],)],
dtype=[('y', '<i4', (2,))])
In [391]: np.array([([1,2],),([3,4],)])
Out[391]:
array([[[1, 2]],
[[3, 4]]])
In [392]: np.array([([1,2],),([3,4],)], dtype=dt1)
Out[392]:
array([([1, 2],), ([3, 4],)],
dtype=[('y', '<i4', (2,))])
The display (and input) has lists within tuples within list. And that's just the start
In [393]: dt1=np.dtype([('x',dt,(2,))])
In [394]: dt1
Out[394]: dtype([('x', [('y', '<i4')], (2,))])
In [395]: np.ones((2,),dt1)
Out[395]:
array([([(1,), (1,)],), ([(1,), (1,)],)],
dtype=[('x', [('y', '<i4')], (2,))])
convert list of tuples to structured numpy array