dplyr mutate/transmute: drop only the columns used in the formula
We need to specify the columns of interest in transmute
as it will returns only those columns that are passed into
df %>%
transmute(A, B, C, X = D*E)
If there are many columns, then one option to not type it one-by-one would be to convert it to symbol and then do the evaluation (!!!
)
df %>%
transmute(!!! rlang::syms(names(.)[1:3]), X = D*E)
Or if we don't know the index of the columns of interest but only the names of columns to remove
df %>%
transmute(!!! rlang::syms(setdiff(names(.), c('D', 'E'))), X = D*E)
data
set.seed(24)
df <- as.data.frame(matrix(sample(1:9, 5*10, replace = TRUE),
ncol = 5, dimnames = list(NULL, LETTERS[1:5])))
If you're looking to combine the two operations, you can use NULL
in mutate
to specify which columns should be dropped:
df %>% mutate( X=D*E, D=NULL, E=NULL )
Unfortunately, you still have to mention each variable twice, so perhaps it's only marginally more concise.
UPDATE: So, I really like this question because it essentially requests a mutator that has some features of both mutate
and transmute
. Such a mutator will need to parse the provided expression(s) to identify which symbols are being used by the computation and then remove those symbols from the result.
To implement such a mutator, we will need some tools. First, let's define a function that retrieves an expression's abstract syntax tree (AST).
library( tidyverse )
## Recursively constructs the abstract syntax tree (AST) of the provided expression
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }
Here's an example of getAST
in action:
z <- quote( a*log10(x)+b ) ## Captures the expression a*log10(x)+b
getAST( z ) %>% str
# List of 3
# $ : symbol +
# $ :List of 3
# ..$ : symbol *
# ..$ : symbol a
# ..$ :List of 2
# .. ..$ : symbol log10
# .. ..$ : symbol x
# $ : symbol b
Retrieving the list of symbols used by an expression requires nothing more than flattening and deparsing this tree.
## Retrieves all symbols (as strings) used in a given expression
getSyms <- function( ee ) { getAST(ee) %>% unlist %>% map_chr(deparse) }
getSyms(z)
# [1] "+" "*" "a" "log10" "x" "b"
We are now ready to implement our new mutator that computes new columns (similar to mutate
) and removes variables used in the computation (similar to transmute
):
## A new mutator that removes all variables used by the computations
transmutate <- function( .data, ... )
{
## Capture the provided expressions and retrieve their symbols
vSyms <- enquos(...) %>% map( ~getSyms(get_expr(.x)) )
## Identify symbols that are in common with the provided dataset
## These columns are to be removed
vToRemove <- intersect( colnames(.data), unlist(vSyms) )
## Pass on the expressions to mutate to do the work
## Remove the identified columns from the result
mutate( .data, ... ) %>% select( -one_of(vToRemove) )
}
Let's take the new function out for a spin:
## Expected output should include new columns X, Y
## removed columns vs, drat, wt, mpg, and cyl
## and everything else the same
## (Note that in the classical tidyverse spirit, rownames are not preserved)
transmutate( mtcars, X = ifelse( vs, drat, wt ), Y = mpg*cyl )
# disp hp qsec am gear carb X Y
# 1 160.0 110 16.46 1 4 4 2.620 126.0
# 2 160.0 110 17.02 1 4 4 2.875 126.0
# 3 108.0 93 18.61 1 4 1 3.850 91.2
# 4 258.0 110 19.44 0 3 1 3.080 128.4
# ...