Convert pandas dataframe to numpy array - which approach to prefer?
The functions you mention serve different purposes.
pd.to_numeric
: Use this to convert types in your dataframe if your data is not currently stored in numeric form or if you wish to cast as an optimal type viadowncast='float'
ordowncast='integer'
.pd.DataFrame.to_numpy()
(v0.24+) orpd.DataFrame.values
: Use this to retrievenumpy
array representation of your dataframe.pd.DataFrame.as_matrix
: Do not use this. It is included only for backwards compatibility.
Under the hood, a pandas.DataFrame
is not much more than a numpy.array
. The simplest and possibly fastest way is to use pandas.DataFrame.values
DataFrame.values
Numpy representation of NDFrame
Notes
The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type convention, mixing int64 and uint64 will result in a flot64 dtype.