Recoding variables with R

I found mapvalues from plyr package very handy. Package also contains function revalue which is similar to car:::recode.

The following example will "recode"

> mapvalues(letters, from = c("r", "o", "m", "a", "n"), to = c("R", "O", "M", "A", "N"))
 [1] "A" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "M" "N" "O" "p" "q" "R" "s" "t" "u" "v" "w" "x" "y" "z"

Recoding can mean a lot of things, and is fundamentally complicated.

Changing the levels of a factor can be done using the levels function:

> #change the levels of a factor
> levels(veteran$celltype) <- c("s","sc","a","l")

Transforming a continuous variable simply involves the application of a vectorized function:

> mtcars$mpg.log <- log(mtcars$mpg) 

For binning continuous data look at cut and cut2 (in the hmisc package). For example:

> #make 4 groups with equal sample sizes
> mtcars[['mpg.tr']] <- cut2(mtcars[['mpg']], g=4)
> #make 4 groups with equal bin width
> mtcars[['mpg.tr2']] <- cut(mtcars[['mpg']],4, include.lowest=TRUE)

For recoding continuous or factor variables into a categorical variable there is recode in the car package and recode.variables in the Deducer package

> mtcars[c("mpg.tr2")] <- recode.variables(mtcars[c("mpg")] , "Lo:14 -> 'low';14:24 -> 'mid';else -> 'high';")

If you are looking for a GUI, Deducer implements recoding with the Transform and Recode dialogs:

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.TransformVariables

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.RecodeVariables


I've found that it can sometimes be easier to convert non numeric factors to character before attempting to change them, for example.

df <- data.frame(example=letters[1:26]) 
example <- as.character(df$example)
example[example %in% letters[1:20]] <- "a"
example[example %in% letters[21:26]] <- "b"

Also, when importing data, it can be useful to ensure that numbers are actually numeric before attempting to convert:

df <- data.frame(example=1:100)
example <- as.numeric(df$example)
example[example < 20] <- 1
example[example >= 20 & example < 80] <- 2
example[example >= 80] <- 3

I find this very convenient when several values should be transformed (its like doing recodes in Stata):

# load package and gen some data
require(car)
x <- 1:10

# do the recoding
x
## [1]   1   2   3   4   5   6   7   8   9  10

recode(x,"10=1; 9=2; 1:4=-99")
## [1] -99 -99 -99 -99   5   6   7   8   2   1

Tags:

R