Fast replacing values in dataframe in R
Try transforming your df to a matrix.
df <- data.frame(a=rnorm(1000),b=rnorm(1000))
m <- as.matrix(df)
m[m<0] <- 0
df <- as.data.frame(m)
Both your original approach and the current answer create an object the same size as m
(or df
) when creating m<0
(the matrix approach is quicker because there is less internal copying with [<-
compared with [<-.data.frame
You can use lapply
and replace
, then you are only looking at a vector or length (nrow(df))
each time
and not copying so much
df <- as.data.frame(lapply(df, function(x){replace(x, x <0,0)})
The above code should be quite effiicent.
If you use data.table
, then most of the memory (and) time inefficiency of the data.frame
approach is removed. It would be ideal for a large data situation like yours.
library(data.table)
# this really shouldn't be
DT <- lapply(df, function(x){replace(x, x <0,0)})
# change to data.table
setattr(DT, 'class', c('data.table','data.frame'))
# or
# DT <- as.data.table(df, function(x){replace(x, x <0,0)})
You could set keys on all the columns and then replacing by reference for key values less than 0