R: Remove multiple empty columns of character variables
If you're talking about columns where all values are NA
, use remove_empty("cols")
from the janitor package.
If you have character vectors where every value is the empty string ""
, you can first convert those values to NA
throughout your data.frame with na_if
from the dplyr package:
dat <- data.frame(
x = c("a", "b", "c"),
y = c("", "", ""),
z = c(NA, NA, NA),
stringsAsFactors = FALSE
)
dat
#> x y z
#> 1 a NA
#> 2 b NA
#> 3 c NA
library(dplyr)
library(janitor)
dat %>%
mutate_all(funs(na_if(., ""))) %>%
remove_empty("cols")
#> x
#> 1 a
#> 2 b
#> 3 c
If your empty columns are really empty character columns, something like the following should work. It will need to be modified if your "empty" character columns include, say, spaces.
Sample data:
mydf <- data.frame(
A = c("a", "b"),
B = c("y", ""),
C = c("", ""),
D = c("", ""),
E = c("", "z")
)
mydf
# A B C D E
# 1 a y
# 2 b z
Identifying and removing the "empty" columns.
mydf[!sapply(mydf, function(x) all(x == ""))]
# A B E
# 1 a y
# 2 b z
Alternatively, as recommended by @Roland:
> mydf[, colSums(mydf != "") != 0]
A B E
1 a y
2 b z
You can do either of the following:
emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]
or:
emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]
If by empty you mean they are ""
, the second approach can be adapted like so:
emptycols <- colSums(df == "") == nrow(df)
I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.
The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:
df[!sapply(df, function (x) all(is.na(x) | x == ""))]