How do I tell if a column in a pandas dataframe is of type datetime? How do I tell if a column is numerical?
I just encountered this issue and found that @charlie-haley's answer isn't quite general enough for my use case. In particular np.datetime64
doesn't seem to match datetime64[ns, UTC]
.
df['date_col'] = pd.to_datetime(df['date_str'], utc=True)
print(df.date_str.dtype) # datetime64[ns, UTC]
You could also extend the list of dtypes to include other types, but that doesn't seem like a good solution for future compatability, so I ended up using the is_datetime64_any_dtype
function from the pandas api instead.
In:
from pandas.api.types import is_datetime64_any_dtype as is_datetime
df[[column for column in df.columns if is_datetime(df[column])]]
Out:
date_col
0 2017-02-01 00:00:00+00:00
1 2017-03-01 00:00:00+00:00
2 2017-04-01 00:00:00+00:00
3 2017-05-01 00:00:00+00:00
Pandas has a cool function called select_dtypes
, which can take either exclude or include (or both) as parameters. It filters the dataframe based on dtypes. So in this case, you would want to include columns of dtype np.datetime64
. To filter by integers, you would use [np.int64, np.int32, np.int16, np.int]
, for float: [np.float32, np.float64, np.float16, np.float]
, to filter by numerical columns only: [np.number]
.
df.select_dtypes(include=[np.datetime64])
Out:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In:
df.select_dtypes(include=[np.number])
Out:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2
bit uglier Numpy alternative:
In [102]: df.loc[:, [np.issubdtype(t, np.datetime64) for t in df.dtypes]]
Out[102]:
date_col
0 2017-02-01
1 2017-03-01
2 2017-04-01
3 2017-05-01
In [103]: df.loc[:, [np.issubdtype(t, np.number) for t in df.dtypes]]
Out[103]:
col1 col2
0 1 2
1 1 2
2 1 2
3 1 2