How to set dtypes by column in pandas DataFrame
I just ran into this, and the pandas issue is still open, so I'm posting my workaround. Assuming df
is my DataFrame and dtype
is a dict mapping column names to types:
for k, v in dtype.items():
df[k] = df[k].astype(v)
(note: use dtype.iteritems()
in python 2)
For the reference:
- The list of allowed data types (NumPy
dtypes
): https://docs.scipy.org/doc/numpy-1.12.0/reference/arrays.dtypes.html - Pandas also supports some other types. E.g.,
category
: http://pandas.pydata.org/pandas-docs/stable/categorical.html - The relevant GitHub issue: https://github.com/pandas-dev/pandas/issues/9287
As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer
However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:
>>> myarray = np.random.randint(0,5,size=(2,2))
>>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)])
>>> mydf = pd.DataFrame.from_records(record)
>>> mydf.dtypes
a float64
b int64
dtype: object