Using R's lm on a dataframe with a list of predictors
Using the formula notation y ~ .
specifies that you want to regress y on all of the other variables in the dataset.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# fits a model using x1 and x2
fit <- lm(y ~ ., data = df)
# Removes the column containing x1 so regression on x2 only
fit <- lm(y ~ ., data = df[, -2])
There is an alternative to Dason's answer, for when you want to specify the columns, to exclude, by name. It is to use subset()
, and specify the select
argument:
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select=-x1))
Trying to use data[,-c("x1")]
fails with "invalid argument to unary operator".
It can extend to excluding multiple columns: subset(df, select = -c(x1,x2))
And you can still use numeric columns:
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select = -2))
(That is equivalent to subset(df, select=-x1)
because x1
is the 2nd column.)
Naturally you can also use this to specify the columns to include.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
fit = lm(y ~ ., data = subset(df, select=c(y,x2)) )
(Yes, that is equivalent to lm(y ~ x2, df)
but is distinct if you were then going to be using step()
, for instance.)