Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

pandas >= 1.1: `df.value_counts` is available!

From pandas 1.1, this will be my recommended method for counting the number of rows in groups (i.e., the group size). To count the number of non-nan rows in a group for a specific column, check out the accepted answer.

Old

Click to copy

df.groupby(['A', 'B']).size()   # df.groupby(['A', 'B'])['C'].count()

New [✓]

Click to copy

df.value_counts(subset=['A', 'B'])

Note that size and count are not identical, the former counts all rows per group, the latter counts non-null rows only. See this other answer of mine for more.

Minimal Example

Click to copy

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
                   'num_wings': [2, 0, 0, 0]},
                  index=['falcon', 'dog', 'cat', 'ant'])
df
        num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0

Click to copy

df.value_counts(subset=['num_legs', 'num_wings'], sort=False)

num_legs  num_wings
2         2            1
4         0            2
6         0            1
dtype: int64

Compare this output with

Click to copy

df.groupby(['num_legs', 'num_wings'])['num_legs'].size()

num_legs  num_wings
2         2            1
4         0            2
6         0            1
Name: num_legs, dtype: int64

Performance

It's also faster if you don't sort the result:

Click to copy

%timeit df.groupby(['num_legs', 'num_wings'])['num_legs'].count()
%timeit df.value_counts(subset=['num_legs', 'num_wings'], sort=False)

640 µs ± 28.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
568 µs ± 6.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Pandas, groupby and count

pandas >= 1.1: `df.value_counts` is available!

Minimal Example

Performance

Tags:

Python

Pandas

Pandas Groupby

Related

Recent Posts

Pandas, groupby and count

pandas >= 1.1: df.value_counts is available!

Minimal Example

Performance

Tags:

Python

Pandas

Pandas Groupby

Related

pandas >= 1.1: `df.value_counts` is available!