How to preserve base data frame rownames upon filtering in dplyr chain
For gene counts, you often want to know if at least x samples have more than y counts, rather than just across all samples.
Not as pretty as filter_if, but I'm not sure how you'd implement the same rowSums conditions using all_vars
x <- sample_threshold
y <- count_threshold
require(dplyr)
require(tibble)
df %>%
tibble::rownames_to_column('gene') %>%
dplyr::filter(rowSums(dplyr::select(., -gene) > y) > x) %>%
tibble::column_to_rownames('gene')
Here is another base R
method with Reduce
df[Reduce(`&`, lapply(df, `>=`, 8)),]
# BoneMarrow Pulmonary
#ATP1B1 30 3380
#PRR11 2703 27
you can convert rownames to a column and revert back after filtering:
library(dplyr)
library(tibble) # for `rownames_to_column` and `column_to_rownames`
df %>%
rownames_to_column('gene') %>%
filter_if(is.numeric, all_vars(. >= 8)) %>%
column_to_rownames('gene')
# BoneMarrow Pulmonary
# ATP1B1 30 3380
# PRR11 2703 27
How about try this by using base R Boolean
df[rowSums(df>8)==dim(df)[2],]
BoneMarrow Pulmonary
ATP1B1 30 3380
PRR11 2703 27
EDIT1: Or you can do df[!rowSums(df<8),]
(as per @user20650) will give back you same result.