How to find the percentage of NAs in a data.frame?
If you are interested to find percentage of complete cases.
Using Same Example mentioned here.
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
Output :
x y
1 1 NA
2 2 NA
3 NA 4
4 3 5
Finding Complete cases:
complete.cases(x)
Output :
[1] FALSE FALSE FALSE TRUE
Percentage of complete cases:
mean(complete.cases(x))
Output:
[1] 0.25
That means 25% of complete rows are available in data provided. i.e Only fourth row is complete rest all contains NA values.
Cheers!
Updated version of dplyr which doesnt support funs anymore:
x%>% summarise_all(list(name = ~sum(is.na(.))/length(.)))
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
For the whole dataframe:
sum(is.na(x))/prod(dim(x))
Or
mean(is.na(x))
For columns:
apply(x, 2, function(col)sum(is.na(col))/length(col))
Or
colMeans(is.na(x))
You could also use dplyr::summarize_all
for the column-wise proportions.
x %>% summarize_all(funs(sum(is.na(.)) / length(.)))
Which will give
x y
1 0.25 0.5