Number of unique values per column by group

The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:

df.groupby('A').apply(lambda x: x.B.nunique())

will print:

A
bar     2
flux    2
foo     3

And doing:

df.groupby('A').apply(lambda x: x.E.nunique())

will print:

A
bar     1
flux    2
foo     2

Alternatively you can do this with one function call using:

df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})

which will print:

      B  E
A
bar   2  1
flux  2  2
foo   3  2

To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.

The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.

I encountered the same problem. Upgrading pandas to the latest version solved the problem for me.

df.groupby('A').nunique()

The above code did not work for me in Pandas version 0.19.2. I upgraded it to Pandas version 0.21.1 and it worked.

You can check the version using the following code:

print('Pandas version ' + pd.__version__)

Number of unique values per column by group

Tags:

Python

Pandas

Related

Recent Posts