dplyr mutate/transmute: drop only the columns used in the formula

We need to specify the columns of interest in transmute as it will returns only those columns that are passed into

df %>% 
    transmute(A, B, C, X = D*E)

If there are many columns, then one option to not type it one-by-one would be to convert it to symbol and then do the evaluation (!!!)

df %>% 
  transmute(!!! rlang::syms(names(.)[1:3]), X = D*E)

Or if we don't know the index of the columns of interest but only the names of columns to remove

df %>% 
    transmute(!!! rlang::syms(setdiff(names(.), c('D', 'E'))), X = D*E)

data

set.seed(24)
df <- as.data.frame(matrix(sample(1:9, 5*10, replace = TRUE), 
          ncol = 5, dimnames = list(NULL, LETTERS[1:5])))

If you're looking to combine the two operations, you can use NULL in mutate to specify which columns should be dropped:

df %>% mutate( X=D*E, D=NULL, E=NULL )

Unfortunately, you still have to mention each variable twice, so perhaps it's only marginally more concise.

UPDATE: So, I really like this question because it essentially requests a mutator that has some features of both mutate and transmute. Such a mutator will need to parse the provided expression(s) to identify which symbols are being used by the computation and then remove those symbols from the result.

To implement such a mutator, we will need some tools. First, let's define a function that retrieves an expression's abstract syntax tree (AST).

library( tidyverse )

## Recursively constructs the abstract syntax tree (AST) of the provided expression
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }

Here's an example of getAST in action:

z <- quote( a*log10(x)+b )   ## Captures the expression a*log10(x)+b
getAST( z ) %>% str
# List of 3
#  $ : symbol +
#  $ :List of 3
#   ..$ : symbol *
#   ..$ : symbol a
#   ..$ :List of 2
#   .. ..$ : symbol log10
#   .. ..$ : symbol x
#  $ : symbol b

Retrieving the list of symbols used by an expression requires nothing more than flattening and deparsing this tree.

## Retrieves all symbols (as strings) used in a given expression
getSyms <- function( ee ) { getAST(ee) %>% unlist %>% map_chr(deparse) }
getSyms(z)
# [1] "+"     "*"     "a"     "log10" "x"     "b"

We are now ready to implement our new mutator that computes new columns (similar to mutate) and removes variables used in the computation (similar to transmute):

## A new mutator that removes all variables used by the computations
transmutate <- function( .data, ... )
{
    ## Capture the provided expressions and retrieve their symbols
    vSyms <- enquos(...) %>% map( ~getSyms(get_expr(.x)) )

    ## Identify symbols that are in common with the provided dataset
    ## These columns are to be removed
    vToRemove <- intersect( colnames(.data), unlist(vSyms) )

    ## Pass on the expressions to mutate to do the work
    ## Remove the identified columns from the result
    mutate( .data, ... ) %>% select( -one_of(vToRemove) )
}

Let's take the new function out for a spin:

## Expected output should include new columns X, Y
##    removed columns vs, drat, wt, mpg, and cyl
##    and everything else the same
## (Note that in the classical tidyverse spirit, rownames are not preserved)
transmutate( mtcars, X = ifelse( vs, drat, wt ), Y = mpg*cyl )
#     disp  hp  qsec am gear carb     X     Y
# 1  160.0 110 16.46  1    4    4 2.620 126.0
# 2  160.0 110 17.02  1    4    4 2.875 126.0
# 3  108.0  93 18.61  1    4    1 3.850  91.2
# 4  258.0 110 19.44  0    3    1 3.080 128.4
# ...

dplyr mutate/transmute: drop only the columns used in the formula

data

Tags:

R

Dplyr

Related

Recent Posts