Check if dataframe column is Categorical

In my pandas version (v1.0.3), a shorter version of joris' answer is available.

df = pd.DataFrame({'noncat': [1, 2, 3], 'categ': pd.Categorical(['A', 'B', 'C'])})

print(isinstance(df.noncat.dtype, pd.CategoricalDtype))  # False
print(isinstance(df.categ.dtype, pd.CategoricalDtype))   # True

print(pd.CategoricalDtype.is_dtype(df.noncat)) # False
print(pd.CategoricalDtype.is_dtype(df.categ))  # True

Use the name property to do the comparison instead, it should always work because it's just a string:

>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'

>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'

So, to sum up, you can end up with a simple, straightforward function:

def is_categorical(array_like):
    return array_like.dtype.name == 'category'

First, the string representation of the dtype is 'category' and not 'categorical', so this works:

In [41]: df.cat_column.dtype == 'category'
Out[41]: True

But indeed, as you noticed, this comparison gives a TypeError for other dtypes, so you would have to wrap it with a try .. except .. block.


Other ways to check using pandas internals:

In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype)
Out[42]: True

In [43]: pd.api.types.is_categorical_dtype(df.cat_column)
Out[43]: True

For non-categorical columns, those statements will return False instead of raising an error. For example:

In [44]: pd.api.types.is_categorical_dtype(df.x)
Out[44]: False

For much older version of pandas, replace pd.api.types in the above snippet with pd.core.common.


Just putting this here because pandas.DataFrame.select_dtypes() is what I was actually looking for:

df['column'].name in df.select_dtypes(include='category').columns

Thanks to @Jeff.

Tags:

Python

Pandas