R data.table column names not working within a function
Enumerate all possible pairs
u_name <- unique(DT$my_name)
all_pairs <- CJ(u_name,u_name)[V1 < V2]
Enumerate observed pairs
obs_pairs <- unique(
DT[,{un <- unique(my_name); CJ(un,un)[V1 < V2]}, by=my_id][, !"my_id"]
)
Take the difference
all_pairs[!J(obs_pairs)]
CJ
is like expand.grid
except that it creates a data.table with all of its columns as its key. A data.table X
must be keyed for a join X[J(Y)]
or a not-join X[!J(Y)]
(like the last line) to work. The J
is optional, but makes it more obvious that we're doing a join.
Simplifications. @CathG pointed out that there is a cleaner way of constructing obs_pairs
if you always have two sorted "names" for each "id" (as in the example data): use as.list(un)
in place of CJ(un,un)[V1 < V2]
.
The function debugonce()
is extremely useful in these scenarios.
debugonce(mapply)
mapply(get_pairs, tid1, tid2, DT)
# Hit enter twice
# from within BROWSER
debugonce(FUN)
# Hit enter twice
# you'll be inside your function, and then type DT
DT
# [1] "A" "B" "C" "D" "E" "F"
Q # (to quit debugging mode)
which is wrong. Basically, mapply()
takes the first element of each input argument and passes it to your function. In this case you've provided a data.table, which is also list. So, instead of passing the entire data.table, it's passing each element of the list (columns).
So, you can get around this by doing:
mapply(get_pairs, tid1, tid2, list(DT))
But mapply()
simplifies the result by default, and therefore you'd get a matrix
back. You'll have to use SIMPLIFY = FALSE
.
mapply(get_pairs, tid1, tid2, list(DT), SIMPLIFY = FALSE)
Or simply use Map
:
Map(get_pairs, tid1, tid2, list(DT))
Use rbindlist()
to bind the results.
HTH
Why does this function fail only when used within an mapply? I think this has something to do with the scope of data.table names, but I'm not sure.
The reason the function is failing has nothing to do with scoping in this case. mapply
vectorizes the function, it takes each element of each parameter and passes to the function. So, in your case, the data.table
elements are its columns, so mapply
is passing the column my_name
instead of the complete data.table
.
If you want to pass the complete data.table
to mapply
, you should use the MoreArgs
parameter. Then your function will work:
res <- mapply(get_pairs, tid1, tid2, MoreArgs = list(tdt=DT), SIMPLIFY = FALSE)
do.call("rbind", res)
Var1 Var2
1 A C
2 B C
3 A D
4 B D
5 A E
6 B E
7 A F
8 B F
9 C E
10 D E
11 C F
12 D F