How can I make a numpy ndarray from bytes?
After your edit it seems you are going into the wrong direction!
You can't use np.tobytes()
to store a complete array containing all informations like shapes and types when reconstruction from these bytes only is needed! It will only save the raw data (cell-values) and flatten these in C or Fortran-order.
Now we don't know your task. But you will need something based on serialization. There are tons of approaches, the easiest being the following based on python's pickle (example here: python3!):
import pickle
import numpy as np
x = np.array([[0, 1], [2, 3]])
print(x)
x_as_bytes = pickle.dumps(x)
print(x_as_bytes)
print(type(x_as_bytes))
y = pickle.loads(x_as_bytes)
print(y)
Output:
[[0 1]
[2 3]]
b'\x80\x03cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02C\x01bq\x03\x87q\x04Rq\x05(K\x01K\x02K\x02\x86q\x06cnumpy\ndtype\nq\x07X\x02\x00\x00\x00i8q\x08K\x00K\x01\x87q\tRq\n(K\x03X\x01\x00\x00\x00<q\x0bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x0cb\x89C \x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00q\rtq\x0eb.'
<class 'bytes'>
[[0 1]
[2 3]]
The better alternative would be joblib's pickle with specialized pickling for large arrays. joblib's functions are file-object based and can be used in-memory with byte-strings too using python's BytesIO.
To deserialize the bytes you need np.frombuffer()
.tobytes()
serializes the array into bytes and the np.frombuffer()
deserializes them.
Bear in mind that once serialized, the shape info is lost, which means that after deserialization, it is required to reshape it back to its original shape.
Below is a complete example:
import numpy as np
x = np.array([[0, 1], [2, 3]], np.int8)
bytes = x.tobytes()
# bytes is a raw array, which means it contains no info regarding the shape of x
# let's make sure: we have 4 values with datatype=int8 (one byte per array's item), therefore the length of bytes should be 4bytes
assert len(bytes) == 4, "Ha??? Weird machine..."
deserialized_bytes = np.frombuffer(bytes, dtype=np.int8)
deserialized_x = np.reshape(deserialized_bytes, newshape=(2, 2))
assert np.array_equal(x, deserialized_x), "Deserialization failed..."
If you know the dimensions you are recreating ahead of time, do
numpy.ndarray(<dimensions>,<dataType>,<bytes(aka buffer)>)
x = numpy.array([[1.0,1.1,1.2,1.3],[2.0,2.1,2.2,2.3],[3.0,3.1,3.2,3.3]],numpy.float64)
#array([[1. , 1.1, 1.2, 1.3],
# [2. , 2.1, 2.2, 2.3],
# [3. , 3.1, 3.2, 3.3]])
xBytes = x.tobytes()
#b'\x00\x00\x00\x00\x00\x00\xf0?\x9a\x99\x99\x99\x99\x99\xf1?333333\xf3?\xcd\xcc\xcc\xcc\xcc\xcc\xf4?\x00\x00\x00\x00\x00\x00\x00@\xcd\xcc\xcc\xcc\xcc\xcc\x00@\x9a\x99\x99\x99\x99\x99\x01@ffffff\x02@\x00\x00\x00\x00\x00\x00\x08@\xcd\xcc\xcc\xcc\xcc\xcc\x08@\x9a\x99\x99\x99\x99\x99\t@ffffff\n@'
newX = numpy.ndarray((3,4),numpy.float64,xBytes)
#array([[1. , 1.1, 1.2, 1.3],
# [2. , 2.1, 2.2, 2.3],
# [3. , 3.1, 3.2, 3.3]])
Another approach might be, if you have stored your data as records of bytes rather than as an entire ndarray and your selection of data varies from ndarray to ndarray, you can aggregate your pre-array data as bytes in a python bytearray, then when it is the desired size, you already know the required dimensions, and can supply those dimensions/dataType with the bytearray as a buffer.