Convert Bitstring (String of 1 and 0s) to numpy array

For a string s = "100100101", you can convert it to a numpy array at least two different ways.

The first by using numpy's fromstring method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements.

import numpy as np

s = "100100101"
a = np.fromstring(s,'u1') - ord('0')

print a  # [1 0 0 1 0 0 1 0 1]

Where 'u1' is the datatype and ord('0') is used to subtract the "base" value from each element.

The second way is by converting each string element to an integer (since strings are iterable), then passing that list into np.array:

import numpy as np

s = "100100101"
b = np.array(map(int, s))

print b  # [1 0 0 1 0 0 1 0 1]

Then

# To see its a numpy array:
print type(a)  # <type 'numpy.ndarray'>
print a[0]     # 1
print a[1]     # 0
# ...

Note the second approach scales significantly worse than the first as the length of the input string s increases. For small strings, it's close, but consider the timeit results for strings of 90 characters (I just used s * 10):

fromstring: 49.283392424 s
map/array:   2.154540959 s

(This is using the default timeit.repeat arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)


One pandas method would be to call apply on the df column to perform the conversion:

In [84]:

df = pd.DataFrame({'bit':['100100101']})
t = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
t[0]
Out[84]:
array([1, 0, 0, 1, 0, 0, 1, 0, 1])