Delete columns/rows with more than x% missing
To remove columns with some amount of NA, you can use
colMeans(is.na(...))
## Some sample data
set.seed(0)
dat <- matrix(1:100, 10, 10)
dat[sample(1:100, 50)] <- NA
dat <- data.frame(dat)
## Remove columns with more than 50% NA
dat[, which(colMeans(!is.na(dat)) > 0.5)]
## Remove rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), ]
## Remove columns and rows with more than 50% NA
dat[which(rowMeans(!is.na(dat)) > 0.5), which(colMeans(!is.na(dat)) > 0.5)]
A tidyverse
solution that removes columns with an x% of NA
s(50%) here:
test_data <- data.frame(A=c(rep(NA,12),
520,233,522),
B = c(rep(10,12),
520,233,522))
# Remove all with %NA >= 50
# can just use >50
test_data %>%
purrr::discard(~sum(is.na(.x))/length(.x)* 100 >=50)
Result:
B
1 10
2 10
3 10
4 10
5 10
6 10
7 10
8 10
9 10
10 10
11 10
12 10
13 520
14 233
15 522