Return most frequent string value for each group

by() each value of a, create a table() of b and extract the names() of the largest entry in that table():

> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"

You can wrap this in as.table() to get a prettier output, although it still does not exactly match your desired result:

> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2 
B B

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this:

df %>% 
  count(a, b) %>%
  slice(which.max(n))

Source: local data frame [2 x 3]
Groups: a

  a b n
1 1 B 2
2 2 B 2

Of course there are other approaches, so this is only one possible "key".

What works for me or is simpler is:

df %>% group_by(a) %>% count(b) %>% top_n(1) # includes ties

library(data.table)
DT<-as.data.table(df)
DT[ , .N, by=.(a, b)][
  order(-N), 
  .SD[ N == max(N) ]
  ,by=a]                     # includes ties

Return most frequent string value for each group

Tags:

R

Summarization

Related

Recent Posts