R: how to expand a row containing a "list" to several rows...one for each list member?
I've grown to really love data.table
for this kind of task. It is so very simple. But first, let's make some sample data (which you should provide idealy!)
# Sample data
set.seed(1)
df = data.frame( pep = replicate( 3 , paste( sample(999,3) , collapse=";") ) , pro = sample(3) , stringsAsFactors = FALSE )
Now we use the data.table
package to do the reshaping in a couple of lines...
# Load data.table package
require(data.table)
# Turn data.frame into data.table, which looks like..
dt <- data.table(df)
# pep pro
#1: 266;372;572 1
#2: 908;202;896 3
#3: 944;660;628 2
# Transform it in one line like this...
dt[ , list( pep = unlist( strsplit( pep , ";" ) ) ) , by = pro ]
# pro pep
#1: 1 266
#2: 1 372
#3: 1 572
#4: 3 908
#5: 3 202
#6: 3 896
#7: 2 944
#8: 2 660
#9: 2 628
I think tidyr
's unnest()
is what you're looking for.
df <- tibble::tibble(x = 1:2, y = list(c("a", "b", "c"), c("alpha", "beta")))
df
#> # A tibble: 2 x 2
#> x y
#> <int> <list>
#> 1 1 <chr [3]>
#> 2 2 <chr [2]>
tidyr::unnest(df, cols = y)
#> # A tibble: 5 x 2
#> x y
#> <int> <chr>
#> 1 1 a
#> 2 1 b
#> 3 1 c
#> 4 2 alpha
#> 5 2 beta
Created on 2019-08-10 by the reprex package (v0.3.0)
You have already obtained a nice answer, but it may be useful to dig around in the R toolbox. Here's an example using a function from the splitstackshape
package, concat.split.multiple
. As the name suggests it "allows the user to split multiple columns at once". Although there is only one concatenated column to split in the current example, the function is convenient because it allows us to reshape the data to a long format in the same call. Using the minimal data set provided by @SimonO101:
library(splitstackshape)
df2 <- concat.split.multiple(data = df, split.cols = "pep", seps = ";", direction = "long")
df2
# pro time pep
# 1 1 1 236
# 2 3 1 465
# 3 2 1 641
# 4 1 2 16
# 5 3 2 721
# 6 2 2 323
# 7 1 3 912
# 8 3 3 459
# 9 2 3 283
An id variable ('time') is added to differentiate the multiple items ('pep') that is generated for each group ('pro'). If you wish to remove it, just run subset(df2, select = -time)