Combine result from top_n with an "Other" category in dplyr
Instead of top_n
, this seems like a good case for the convenience function tally
. It uses summarise
, sum
and arrange
under the hood.
Then use factor
to create an "Other" category. Use the levels
argument to set "Other" as the last level. "Other" will then will be placed last in the table (and in any subsequent plot of the result).
If "Country" is factor
in your original data, you may wrap Country[1:3]
in as.character
.
group_by(df, Country) %>%
tally(Count, sort = TRUE) %>%
group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)),
levels = c(Country[1:3], "Other"))) %>%
tally(n)
# Country n
# (fctr) (int)
#1 AUS 6
#2 JPN 5
#3 USA 5
#4 Other 7
We could do this in two steps: first create a sorted data.frame, and then rbind
the top three rows with a summary of the last rows:
d <- df %>% group_by(Country) %>% summarise(Count = sum(Count)) %>% arrange(desc(Count))
rbind(top_n(d,3),
slice(d,4:n()) %>% summarise(Country="other",Count=sum(Count))
)
output
Country Count
(fctr) (int)
1 AUS 6
2 JPN 5
3 USA 5
4 other 7
Here is an option using data.table
. We convert the 'data.frame' to 'data.table' (setDT(dat1)
), grouped by 'Country we get the sum
of 'Count', then order
by 'Count', we rbind
the first three observations with the list
of 'Others' and the sum
of 'Count' of the rest of the observations.
library(data.table)
setDT(dat1)[, list(Count=sum(Count)), Country][order(-Count),
rbind(.SD[1:3], list(Country='Others', Count=sum(.SD[[2]][4:.N]))) ]
# Country Count
#1: AUS 6
#2: USA 5
#3: JPN 5
#4: Others 7
Or using base R
d1 <- aggregate(.~Country, dat1, FUN=sum)
i1 <- order(-d1$Count)
rbind(d1[i1,][1:3,], data.frame(Country='Others',
Count=sum(d1$Count[i1][4:nrow(d1)])))
You can use fct_lump
from the forcats
library
dat1 %>%
group_by(fct_lump(Country, n = 3, w = Count)) %>%
summarize(Count = sum(Count))
This should do it, also you can change the "Other" label using the other_level
param inside fct_lump