Recode a variable using data.table
I think this might be what you're looking for. On the left hand side of :=
we name the variables we want to update and on the right hand side we have the expressions we want to update the corresponding variables with.
DT[, c("V1","V2") := .(as.numeric(V1==2), sapply(V2, function(x) {if(x=="A") "T"
else if (x=="B") "K"
else if (x=="C") "D" }))]
# V1 V2 V4
#1: 0 T 1
#2: 0 K 2
#3: 1 D 3
#4: 0 T 4
#5: 0 K 5
#6: 1 D 6
#7: 0 T 7
#8: 0 K 8
#9: 1 D 9
#10: 0 T 10
#11: 0 K 11
#12: 1 D 12
Alternatively, just use recode
within data.table
:
library(dplyr)
DT[, c("V1","V2") := .(as.numeric(V1==2), recode(V2, "A" = "T", "B" = "K", "C" = "D"))]
With data.table
the recode can be solved with an update on join:
DT[.(V1 = 1:2, to = 0:1), on = "V1", V1 := i.to]
DT[.(V2 = LETTERS[1:3], to = c("T", "K", "D")), on = "V2", V2 := i.to]
which converts DT
to
V1 V2 V4
1: 0 T 1
2: 0 K 2
3: 1 D 3
4: 0 T 4
5: 0 K 5
6: 1 D 6
7: 0 T 7
8: 0 K 8
9: 1 D 9
10: 0 T 10
11: 0 K 11
12: 1 D 12
Edit: @Frank suggested to use i.to
to be on the safe side.
Explanation
The expressions .(V1 = 1:2, to = 0:1)
and .(V2 = LETTERS[1:3], to = c("T", "K", "D"))
, resp., create lookup tables on-the-fly.
Alternatively, the lookup tables can be set-up beforehand
lut1 <- data.table(V1 = 1:2, to = 0:1)
lut2 <- data.table(V2 = LETTERS[1:3], to = c("T", "K", "D"))
lut1
V1 to 1: 1 0 2: 2 1
lut2
V2 to 1: A T 2: B K 3: C D
Then, the update joins become
DT[lut1, on = "V1", V1 := i.to]
DT[lut2, on = "V2", V2 := i.to]
Edit 2: Answers to How can I use this code dynamically?
mat asked "How can I use this code dynamically?"
So, here is a modified version where the name of column to update is provided as a character variable my_var_name
but the lookup tables still are created on-the-fly:
my_var_name <- "V1"
DT[.(from = 1:2, to = 0:1), on = paste0(my_var_name, "==from"),
(my_var_name) := i.to]
my_var_name <- "V2"
DT[.(from = LETTERS[1:3], to = c("T", "K", "D")), on = paste0(my_var_name, "==from"),
(my_var_name) := i.to]
There are 3 points to note:
- Instead of naming the first column of the lookup table dynamically it gets a fixed name
from
. This requires a join between differently named columns (foreign key join). The names of the columns to join on have to be specified via theon
parameter. - The
on
parameter accepts character strings for foreign key joins of the form"V1==from"
. This string is created dynamically usingpaste0()
. - In the expression
(my_var_name) := i.to
, the parentheses around the variablemy_var_name
forces to use the contents ofmy_var_name
.
Dynamic code using pre-defined lookup tables
Now, while the column to recode is specified dynamically by a variable, the lookup tables to use are still hard-coded in the statement which means we have stopped halfways: We need also to select the appropriate lookup table dynamically.
This can be achieved by storing the lookup tables in a list where each list element is named according to the column of DT
it is supposed to recode:
lut_list <- list(
V1 = data.table(from = 1:2, to = 0:1),
V2 = data.table(from = LETTERS[1:3], to = c("T", "K", "D"))
)
lut_list
$V1 from to <int> <int> 1: 1 0 2: 2 1 $V2 from to <char> <char> 1: A T 2: B K 3: C D
Now, we can pick the appropriate lookup table from the list dynamically as well:
my_var_name <- "V1"
DT[lut_list[[my_var_name]], on = paste0(my_var_name, "==from"),
(my_var_name) := i.to]
Going one step further, we can recode all relevant columns of DT
in a loop:
for (v in intersect(names(lut_list), colnames(DT))) {
DT[lut_list[[v]], on = paste0(v, "==from"), (v) := i.to]
}
Note that DT
is updated by reference, i.e., only the affected elements are replaced in place without copying the whole object. So, the for
loop is applied iteratively on the same data object. This is a speciality of data.table and will not work with data.frames or tibbles.