How to name the list of the group_split output in dplyr
Not sure, if this can be done directly. One way is by sampling the dataframe and then use it's unique
names to setNames
.
library(dplyr)
df <- iris %>% sample_n(size = 5)
df %>%
group_split(Species) %>%
setNames(unique(df$Species))
#$setosa
# A tibble: 1 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
#1 5 3.4 1.5 0.2 setosa
#$versicolor
# A tibble: 1 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
#1 6 3.4 4.5 1.6 versicolor
#$virginica
# A tibble: 3 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
#1 7.3 2.9 6.3 1.8 virginica
#2 6.9 3.1 5.1 2.3 virginica
#3 7.7 3 6.1 2.3 virginica
It is weird that group_split
doesn't directly name the lists because it is supposed to be an alternative to base::split
which does name it.
split(df, df$Species)
The document says :
group_split() works like base::split() but
- it uses the grouping structure from group_by() and therefore is subject to the data mask
- it does not name the elements of the list based on the grouping as this typically loses information and is confusing.
For the updated dataset it doesn't work because while naming we are using unique
which gets the data in the same order as they appear whereas group_split
, splits the data based on increasing order of their value. (So the order of splitting is Cluster1
,Cluster11
, Cluster2
...) One way to overcome that is to convert Cluster
to factor
and specify levels
as they appear using unique
.
df <- df %>%
mutate(Cluster = factor(Cluster, levels = unique(Cluster)))
df %>%
group_split(Cluster) %>%
setNames(unique(df$Cluster))
OR if you don't want them as factors do
df %>%
group_split(Cluster) %>%
setNames(sort(unique(df$Cluster)))
Lots of good answers. You can also just do:
iris %>% sample_n(size = 5) %>%
split(f = as.factor(.$Species))
Which will give you:
$setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4 5.5 3.5 1.3 0.2 setosa
5 5.3 3.7 1.5 0.2 setosa
$versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
3 5 2.3 3.3 1 versicolor
$virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 7.7 2.6 6.9 2.3 virginica
2 7.2 3.0 5.8 1.6 virginica
Also works with your dataframe above:
df %>%
split(f = as.factor(.$Cluster))
Gives you:
$Cluster1
# A tibble: 1 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster1 Grhpr 0.00000155 4.66 0.0261 0.00000343
$Cluster11
# A tibble: 2 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster11 Vimp 3.17e-62 16.6 0.0948 1.62e-61
2 Cluster11 Fgfr1op2 2.07e- 8 5.48 0.0310 4.98e- 8
$Cluster12
# A tibble: 1 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster12 Pikfyve 0.0147 2.18 0.0120 0.0245
$Cluster6
# A tibble: 1 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster6 Zfp398 0.000354 3.39 0.0188 0.000684
$Cluster8
# A tibble: 2 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster8 Golga7 4.14e- 6 4.46 0.0251 8.96e- 6
2 Cluster8 Lars2 3.93e-184 28.9 0.165 3.48e-183
$Cluster9
# A tibble: 3 x 6
Cluster gene_name p_value morans_test_statistic morans_I q_value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Cluster9 Tbc1d8 3.47e- 47 14.4 0.0815 1.58e- 46
2 Cluster9 H1f0 9.46e-131 24.3 0.139 7.00e-130
3 Cluster9 Ankrd13a 1.43e- 38 12.9 0.0737 5.96e- 38