Number of unique values per column by group
The DataFrame
object doesn't have nunique
, only Series
do. You have to pick out which column you want to apply nunique()
on. You can do this with a simple dot operator:
df.groupby('A').apply(lambda x: x.B.nunique())
will print:
A
bar 2
flux 2
foo 3
And doing:
df.groupby('A').apply(lambda x: x.E.nunique())
will print:
A
bar 1
flux 2
foo 2
Alternatively you can do this with one function call using:
df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})
which will print:
B E
A
bar 2 1
flux 2 2
foo 3 2
To answer your question about why your recursive lambda prints the A
column as well, it's because when you do a groupby
/apply
operation, you're now iterating through three DataFrame
objects. Each DataFrame
object is a sub-DataFrame
of the original. Applying an operation to that will apply it to each Series
. There are three Series
per DataFrame
you're applying the nunique()
operator to.
The first Series
being evaluated on each DataFrame
is the A
Series
, and since you've done a groupby
on A
, you know that in each DataFrame
, there is only one unique value in the A
Series
. This explains why you're ultimately given an A
result column with all 1
's.
I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.
df.groupby('A').nunique()
The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.
You can check the version using the following code:
print('Pandas version ' + pd.__version__)