R Apply() function on specific dataframe columns
Using an example data.frame and example function (just +1 to all values)
A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 1 1 1 1 1 1
#2 2 2 2 2 2 2 2 2 2
#3 3 3 3 3 3 3 3 3 3
#4 4 4 4 4 4 4 4 4 4
data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
Or even:
data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
As mentioned, you simply want the standard R apply
function applied to columns (MARGIN=2
):
wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)
Or, for short:
wifi[,4:9] <- apply(wifi[,4:9], 2, A)
This updates columns 4:9 in-place using the A()
function. Now, let's assume that na.rm
is an argument to A()
, which it probably should be. We can pass na.rm=T
to remove NA values from the computation like so:
wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)
The same is true for any other arguments you want to pass to your custom function.
This task is easily achieved with the dplyr
package's across
functionality.
Borrowing the data structure suggested by thelatemail:
A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
We can indicate the columns we wish to apply the function to either by index like this:
library(dplyr)
wifi %>%
mutate(across(4:9, A))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
Or by name:
wifi %>%
mutate(across(X4:X9, A))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 1 1 1 2 2 2 2 2 2
#2 2 2 2 3 3 3 3 3 3
#3 3 3 3 4 4 4 4 4 4
#4 4 4 4 5 5 5 5 5 5
lapply
is probably a better choice than apply
here, as apply first coerces your data.frame to an array which means all the columns must have the same type. Depending on your context, this could have unintended consequences.
The pattern is:
df[cols] <- lapply(df[cols], FUN)
The 'cols' vector can be variable names or indices. I prefer to use names whenever possible (it's robust to column reordering). So in your case this might be:
wifi[4:9] <- lapply(wifi[4:9], A)
An example of using column names:
wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)