Create categories by comparing a numeric column with a fixed value

In the interest of updating a possible canonical, the package dplyr has the function mutate which lets you create a new column in a data.frame in a vectorized fashion:

iris_new <- iris %>%
    mutate(Regulation = if_else(Sepal.Length >= 5, 'UP', 'DOWN'))

This makes a new column called Regulation which consists of either 'UP' or 'DOWN' based on applying the condition to the Sepal.Length column.

The case_when function (also from dplyr) provides an easy to read way to chain together multiple conditions:

iris %>%
    mutate(Regulation = case_when(Sepal.Length >= 5 ~ 'High',
                                  Sepal.Length >= 4.5 ~ 'Mid',
                                  TRUE ~ 'Low'))

This works just like if_else except instead of 1 condition with a return value for TRUE and FALSE, each line has condition (left side of ~) and a return value (right side of ~) that it returns if TRUE. If false, it moves on to the next condition.

In this case, rows where Sepal.Length >= 5 will return 'High', rows where Sepal.Length < 5 (since the first condition had to fail) & Sepal.Length >= 4.5 will return 'Mid', and all other rows will return 'Low'. Since TRUE is always TRUE, it is used to provide a default value.


iris$Regulation <- ifelse(iris$Sepal.Length >=5, "UP", "DOWN")


