How to rename categories after using pandas.cut with IntervalIndex?
If we have some data:
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)
You may try re-assigning categories like :
In [7]: x.categories = [1,2,3]
In [8]: x
Out[8]:
[NaN, 1, NaN, 2, 3]
Categories (3, int64): [1 < 2 < 3]
or:
In [9]: x.categories = ["small", "medium", "big"]
In [10]: x
Out[10]:
[NaN, small, NaN, medium, big]
Categories (3, object): [small < medium < big]
UPDATE:
df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut(df["col1"].to_list(),bins)
x.categories = [1,2,3]
df['col1'] = x
df.col1
0 NaN
1 1
2 NaN
3 2
4 3
Name: col1, dtype: category
Categories (3, int64): [1 < 2 < 3]
UPDATE 2:
In newer versions of pandas, instead of reassigning categories using x.categories = [1, 2, 3]
, x.cat.rename_categories
should be used:
labels = [1, 2, 3]
x = x.rename_categories(labels)
labels
can be of any type, and in any case, the original categorical order that was set when creating the pd.IntervalIndex
will be preserved.
series = pd.Series([0, 0.5, 1.5, 2.5, 4.5])
bins = [(0, 1), (2, 3), (4, 5)]
index = pd.IntervalIndex.from_tuples(bins)
intervals = index.values
names = ['small', 'med', 'large']
to_name = {interval: name for interval, name in zip(intervals, names)}
named_series = pd.Series(
pd.CategoricalIndex(pd.cut(series, bins_index)).rename_categories(to_name)
)
print(named_series)
0 NaN
1 small
2 NaN
3 med
4 large
dtype: category
Categories (3, object): ['small' < 'med' < 'large']