How to remove data frame column with a single value
You can do:
df1[c(TRUE, lapply(df1[-1], var, na.rm = TRUE) != 0)]
# Item_Name D_1 D_3
# 1 test1 1 11
# 2 test2 0 3
# 3 test3 1 1
where the lapply
piece tells you what variables have some variance:
lapply(df1[-1], var, na.rm = TRUE) != 0
# D_1 D_2 D_3
# TRUE FALSE TRUE
In dplyr
, we can use n_distinct
to count unique values and select_if
to select columns
library(dplyr)
df1 %>% select(where(~n_distinct(.) > 1))
#For dplyr < 1.0.0
#df1 %>% select_if(~n_distinct(.) > 1)
# Item_Name D_1 D_3
#1 test1 1 11
#2 test2 0 3
#3 test3 1 1
We can use the same logic with purrr
's keep
and discard
purrr::keep(df1, ~n_distinct(.) > 1)
purrr::discard(df1, ~n_distinct(.) == 1)
Apart from that data.table
way of doing it could be
library(data.table)
setDT(df1)
df1[, lapply(df1, uniqueN) > 1, with = FALSE]
Or probably this is smarter/better
df1[, .SD, .SDcols=lapply(df1, uniqueN) > 1]
In all the above approaches you could replace n_distinct
/uniqueN
with var
or sd
function after subsetting only numeric columns.
For example,
df1[-1] %>% select_if(~sd(.) != 0)
Filter
is a useful function here. I will filter only for those where there is more than 1 unique value.
i.e.
Filter(function(x)(length(unique(x))>1), df1)
## Item_Name D_1 D_3
## 1 test1 1 11
## 2 test2 0 3
## 3 test3 1 1