How to get value counts for multiple columns at once in Pandas DataFrame?
Just call apply
and pass pd.Series.value_counts
:
In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
a b c d
0 4 6 4 3
1 6 4 6 7
There is actually a fairly interesting and advanced way of doing this problem with crosstab
and melt
df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})
df
a b c
0 table lamp mirror
1 chair candle mirror
2 chair chair mirror
3 lamp lamp mirror
4 bed bed mirror
We can first melt the DataFrame
df1 = df.melt(var_name='columns', value_name='index')
df1
columns index
0 a table
1 a chair
2 a chair
3 a lamp
4 a bed
5 b lamp
6 b candle
7 b chair
8 b lamp
9 b bed
10 c mirror
11 c mirror
12 c mirror
13 c mirror
14 c mirror
And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:
pd.crosstab(index=df1['index'], columns=df1['columns'])
columns a b c
index
bed 1 1 0
candle 0 1 0
chair 2 1 0
lamp 1 2 0
mirror 0 0 5
table 1 0 0
Or in one line, which expands the column names to parameter names with **
(this is advanced)
pd.crosstab(**df.melt(var_name='columns', value_name='index'))
Also, value_counts
is now a top-level function. So you can simplify the currently selected answer to the following:
df.apply(pd.value_counts)
To get the counts only for specific columns:
df[['a', 'b']].apply(pd.Series.value_counts)
where df is the name of your dataframe and 'a' and 'b' are the columns for which you want to count the values.
The solution that selects all categorical columns and makes a dataframe with all value counts at once:
df = pd.DataFrame({
'fruits': ['apple', 'mango', 'apple', 'mango', 'mango', 'pear', 'mango'],
'vegetables': ['cucumber', 'eggplant', 'tomato', 'tomato', 'tomato', 'tomato', 'pumpkin'],
'sauces': ['chili', 'chili', 'ketchup', 'ketchup', 'chili', '1000 islands', 'chili']})
cat_cols = df.select_dtypes(include=object).columns.tolist()
(pd.DataFrame(
df[cat_cols]
.melt(var_name='column', value_name='value')
.value_counts())
.rename(columns={0: 'counts'})
.sort_values(by=['column', 'counts']))
counts
column value
fruits pear 1
apple 2
mango 4
sauces 1000 islands 1
ketchup 2
chili 4
vegetables pumpkin 1
eggplant 1
cucumber 1
tomato 4