Subset a table by columns and rows using a named vector in R

I am not sure if you want something like below

u <- split(myVector,names(myVector))
eval(str2expression(sprintf("diamonds %%>%% filter(%s)",paste0(sapply(names(u),function(x) paste0(x," %in% u$",x)),collapse = " & "))))

such that

> eval(str2expression(sprintf("diamonds %%>%% filter(%s)",paste0(sapply(names(u),function(x) paste0(x," %in% u$",x)),collapse = " & "))))
# A tibble: 6,039 x 10
   carat cut   color clarity depth table price     x     y     z
   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.23 Good  E     VS1      56.9    65   327  4.05  4.07  2.31
 3  0.31 Good  J     SI2      63.3    58   335  4.34  4.35  2.75
 4  0.3  Good  J     SI1      64      55   339  4.25  4.28  2.73
 5  0.23 Ideal J     VS1      62.8    56   340  3.93  3.9   2.46
 6  0.31 Ideal J     SI2      62.2    54   344  4.35  4.37  2.71
 7  0.3  Good  J     SI1      63.4    54   351  4.23  4.29  2.7
 8  0.3  Good  J     SI1      63.8    56   351  4.23  4.26  2.71
 9  0.23 Good  E     VS1      64.1    59   402  3.83  3.85  2.46
10  0.33 Ideal J     SI1      61.1    56   403  4.49  4.55  2.76
# ... with 6,029 more rows

Starting with the split idea of ThomasIsCoding, slightly changed, here is a base R solution based on having Reduce/Map created a logical index.

v <- split(unname(myVector), names(myVector))
i <- Reduce('&', Map(function(x, y){x %in% y}, diamonds[names(v)], v))
diamonds[i, ]
## A tibble: 6,039 x 10
#   carat cut   color clarity depth table price     x     y     z
#   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43
# 2  0.23 Good  E     VS1      56.9    65   327  4.05  4.07  2.31
# 3  0.31 Good  J     SI2      63.3    58   335  4.34  4.35  2.75
# 4  0.3  Good  J     SI1      64      55   339  4.25  4.28  2.73
# 5  0.23 Ideal J     VS1      62.8    56   340  3.93  3.9   2.46
# 6  0.31 Ideal J     SI2      62.2    54   344  4.35  4.37  2.71
# 7  0.3  Good  J     SI1      63.4    54   351  4.23  4.29  2.7 
# 8  0.3  Good  J     SI1      63.8    56   351  4.23  4.26  2.71
# 9  0.23 Good  E     VS1      64.1    59   402  3.83  3.85  2.46
#10  0.33 Ideal J     SI1      61.1    56   403  4.49  4.55  2.76
## ... with 6,029 more rows

Package dplyr

The code above can be written as a function and used in dplyr::filter.

# Input:
# X - a data set to be filtered
# values - a named list
values_in <- function(X, values){
  v <- split(unname(values), names(values))
  i <- Reduce('&', Map(function(x, y){x %in% y}, X[names(v)], v))
  i
}

diamonds %>% filter( values_in(., myVector) )

The output is the same as above and, therefore, omited.

Using both approaches proposed by @Roman (generating all combinations of vector element and joining) and @ThomaslsCoding (splitting the vector) seems to do the trick:

data.frame(split(myVector, names(myVector))) %>% 
expand.grid() %>% 
inner_join(diamonds[,unique(names(myVector))])

Subset a table by columns and rows using a named vector in R

Tags:

R

Related

Recent Posts