Filter top n largest groups in data.frame
We can use table
to calculate frequency for each group
, sort
them in decreasing
order, subset the top 2 entries and filter
the respective groups.
library(dplyr)
example_data %>%
filter(group %in% names(sort(table(group), decreasing = TRUE)[1:2]))
# col1 col2 group
#1 1 16 2
#2 3 18 3
#3 4 19 2
#4 5 20 3
#5 7 22 3
#6 9 24 3
#7 11 26 2
#8 12 27 2
#9 13 28 2
#10 14 29 3
#11 15 30 3
Also you can directly use this in base R subset
subset(example_data, group %in% names(sort(table(group), decreasing = TRUE)[1:2]))
We can use tidyverse
methods for this. Create a frequency column with add_count
, arrange
by that column and filter
the rows where the 'group' is in the last two unique
'group' values
library(dplyr)
example_data %>%
add_count(group) %>%
arrange(n) %>%
filter(group %in% tail(unique(group), 2)) %>%
select(-n)
# A tibble: 11 x 3
# col1 col2 group
# <int> <int> <int>
# 1 1 16 2
# 2 4 19 2
# 3 11 26 2
# 4 12 27 2
# 5 13 28 2
# 6 3 18 3
# 7 5 20 3
# 8 7 22 3
# 9 9 24 3
#10 14 29 3
#11 15 30 3
Or using data.table
library(data.table)
setDT(example_data)[group %in% example_data[, .N, group][order(-N), head(group, 2)]]
With dplyr
, you can also do:
example_data %>%
add_count(group) %>%
filter(dense_rank(desc(n)) <= 2) %>%
select(-n)
col1 col2 group
<int> <int> <int>
1 1 16 2
2 3 18 3
3 4 19 2
4 5 20 3
5 7 22 3
6 9 24 3
7 11 26 2
8 12 27 2
9 13 28 2
10 14 29 3
11 15 30 3
Or:
example_data %>%
add_count(group) %>%
slice(which(dense_rank(desc(n)) <= 2)) %>%
select(-n)