Use of ~ (tilde) in R programming Language
The thing on the right of <-
is a formula
object. It is often used to denote a statistical model, where the thing on the left of the ~
is the response and the things on the right of the ~
are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".
The myFormula <-
part of that line stores the formula in an object called myFormula
so you can use it in other parts of your R code.
Other common uses of formula objects in R
The lattice
package uses them to specify the variables to plot.
The ggplot2
package uses them to specify panels for plotting.
The dplyr
package uses them for non-standard evaulation.
R defines a ~
(tilde) operator for use in formulas. Formulas have all sorts of uses, but perhaps the most common is for regression:
library(datasets)
lm( myFormula, data=iris)
help("~")
or help("formula")
will teach you more.
@Spacedman has covered the basics. Let's discuss how it works.
First, being an operator, note that it is essentially a shortcut to a function (with two arguments):
> `~`(lhs,rhs)
lhs ~ rhs
> lhs ~ rhs
lhs ~ rhs
That can be helpful to know for use in e.g. apply
family commands.
Second, you can manipulate the formula as text:
oldform <- as.character(myFormula) # Get components
myFormula <- as.formula( paste( oldform[2], "Sepal.Length", sep="~" ) )
Third, you can manipulate it as a list:
myFormula[[2]]
myFormula[[3]]
Finally, there are some helpful tricks with formulae (see help("formula")
for more):
myFormula <- Species ~ .
For example, the version above is the same as the original version, since the dot means "all variables not yet used." This looks at the data.frame you use in your eventual model call, sees which variables exist in the data.frame but aren't explicitly mentioned in your formula, and replaces the dot with those missing variables.