Fast vectorized merge of list of data.frames by row

Try this:

bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))
nr <- nrow(sample.list[[1]])
lapply(1:nr, bind.ith.rows)

A couple of solutions that will make this quicker using data.table

EDIT - with larger dataset showing data.table awesomeness even more.

# here are some sample data 
sample.list <- replicate(10000, data.frame(x = sample(1:100, 10), 
  y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), simplify = F)

Gabor's fast solution:

# Solution Gabor
bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))
nr <- nrow(sample.list[[1]])
system.time(rowbound <- lapply(1:nr, bind.ith.rows))

##    user  system elapsed 
##   25.87    0.01   25.92

The data.table function rbindlist will make this even quicker even when working with data.frames)

library(data.table)
fastbind.ith.rows <- function(i) rbindlist(lapply(sample.list, "[", i, TRUE))
system.time(fastbound <- lapply(1:nr, fastbind.ith.rows))

##    user  system elapsed 
##   13.89    0.00   13.89

A `data.table` solution

Here is a solution that uses data.tables - it is split solution on steroids.

# data.table solution
system.time({
    # change each element of sample.list to a data.table (and data.frame) this
    # is done instaneously by reference
    invisible(lapply(sample.list, setattr, name = "class", 
               value = c("data.table",  "data.frame")))
    # combine into a big data set
    bigdata <- rbindlist(sample.list)
    # add a row index column (by refere3nce)
    index <- as.character(seq_len(nr))
    bigdata[, `:=`(rowid, index)]
    # set the key for binary searches
    setkey(bigdata, rowid)
    # split on this -
    dt_list <- lapply(index, function(i, j, x) x[i = J(i)], x = bigdata)
    # if you want to drop the `row id` column
    invisible(lapply(dt_list, function(x) set(x, j = "rowid", value = NULL)))
    # if you really don't want them to be data.tables run this line
    # invisible(lapply(dt_list, setattr,name = 'class', value =
    # c('data.frame')))
})
################################
##    user  system elapsed    ##
##    0.08    0.00    0.08    ##
################################

How awesome is data.table!

Caveat user with `rbindlist`

rbindlist is fast because it does not perform the checking that do.call(rbind,....) will. For example it assumes that any factor columns have the same levels as in the first element of the list.

Here's my attempt with plyr, but I like G. Grothendieck's approach:

library(plyr)
alply(do.call("cbind",sample.list), 1, .fun=matrix,
        ncol=ncol(sample.list[[1]]), byrow=TRUE,
        dimnames=list(1:length(sample.list),
        names(sample.list[[1]])
      ))

Fast vectorized merge of list of data.frames by row

A `data.table` solution

Caveat user with `rbindlist`

Tags:

Performance

List

Merge

R

Dataframe

Related

Recent Posts

Fast vectorized merge of list of data.frames by row

A data.table solution

Caveat user with rbindlist

Tags:

Performance

List

Merge

R

Dataframe

Related

A `data.table` solution

Caveat user with `rbindlist`