Randomly sample groups
I think this approach makes the most sense if you are using dplyr:
iris_grouped <- iris %>%
group_by(Species) %>%
nest()
Which produces:
# A tibble: 3 x 2
Species data
<fct> <list>
1 setosa <tibble [50 × 4]>
2 versicolor <tibble [50 × 4]>
3 virginica <tibble [50 × 4]>
with which you can then use sample_n
:
iris_grouped %>%
sample_n(2)
# A tibble: 2 x 2
Species data
<fct> <list>
1 virginica <tibble [50 × 4]>
2 versicolor <tibble [50 × 4]>
Just use sample()
to choose some number of groups
iris %>% filter(Species %in% sample(levels(Species),2))
Take note that using dplyr
is considerably slower than regular data frame operations:
library(microbenchmark)
microbenchmark(dplyr= iris %>% filter(Species %in% sample(levels(Species),2)),
base= iris[iris[["Species"]] %in% sample(levels(iris[["Species"]]), 2),])
Unit: microseconds
expr min lq mean median uq max neval cld
dplyr 660.287 710.655 753.6704 722.629 771.2860 1122.527 100 b
base 83.629 95.032 110.0936 106.057 119.1715 199.949 100 a
Note [[
is known to be faster than $
, although both work