How to delete columns that contain ONLY NAs?
One way of doing it:
df[, colSums(is.na(df)) != nrow(df)]
If the count of NAs in a column is equal to the number of rows, it must be entirely NA.
Or similarly
df[colSums(!is.na(df)) > 0]
Another option is the janitor
package:
df <- remove_empty_cols(df)
https://github.com/sfirke/janitor
Here is a dplyr solution:
df %>% select_if(~sum(!is.na(.)) > 0)
Update: The summarise_if()
function is superseded as of dplyr 1.0
. Here are two other solutions that use the where()
tidyselect function:
df %>%
select(
where(
~sum(!is.na(.x)) > 0
)
)
df %>%
select(
where(
~!all(is.na(.x))
)
)
It seeems like you want to remove ONLY columns with ALL NA
s, leaving columns with some rows that do have NA
s. I would do this (but I am sure there is an efficient vectorised soution:
#set seed for reproducibility
set.seed <- 103
df <- data.frame( id = 1:10 , nas = rep( NA , 10 ) , vals = sample( c( 1:3 , NA ) , 10 , repl = TRUE ) )
df
# id nas vals
# 1 1 NA NA
# 2 2 NA 2
# 3 3 NA 1
# 4 4 NA 2
# 5 5 NA 2
# 6 6 NA 3
# 7 7 NA 2
# 8 8 NA 3
# 9 9 NA 3
# 10 10 NA 2
#Use this command to remove columns that are entirely NA values, it will leave columns where only some values are NA
df[ , ! apply( df , 2 , function(x) all(is.na(x)) ) ]
# id vals
# 1 1 NA
# 2 2 2
# 3 3 1
# 4 4 2
# 5 5 2
# 6 6 3
# 7 7 2
# 8 8 3
# 9 9 3
# 10 10 2
If you find yourself in the situation where you want to remove columns that have any NA
values you can simply change the all
command above to any
.