Count NAs between first and last occured numbers

Here is an idea via base R,

f1 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):tail(i1, 1) }
f2 <- function(x) {i1 <- which(!is.na(x)); head(i1, 1):length(x) }

merge(stack(sapply(df, function(i) sum(is.na(i[f1(i)])))), 
      stack(sapply(df, function(i) sum(is.na(i[f2(i)])))), by = 'ind')

#  ind values.x values.y
#1   x        0        2
#2   y        1        1
#3   z        2        2

na.trim trims NAs off both ends or just the left or right end if we specify sides="left" or sides="right" so:

library(dplyr)
library(tibble)
library(tidyr)
library(zoo)

df %>%
  pivot_longer(everything()) %>%
  group_by(name) %>%
  summarize(na1 = sum(is.na(na.trim(value))), 
            na2 = sum(is.na(na.trim(value, "left")))) %>%
  ungroup

giving:

# A tibble: 3 x 3
  name    na1   na2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2

Here is one possibility using two functions:

fun1 <- function(x) { #count NA between first and last non NA
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  idx2 <- rev(cumsum(!is.na(rev(x))) > 0) #identify trailing NA
  sum(is.na(x[idx1 & idx2]))
}


fun2 <- function(x) {#count NA between first non-NA and last element
  idx1 <- cumsum(!is.na(x)) > 0 #identify leading NA
  sum(is.na(x[idx1]))
}

Afterwards you just summarise your data.frame and reshape it:

df %>% summarise_all(list(m1 = ~fun1(.), m2 = ~fun2(.))) %>%
  pivot_longer(cols = everything(), names_pattern = "^(.)_(.*)$", names_to = c("vars", "a"),
               values_to = "x") %>%
  spread(a, x)

# A tibble: 3 x 3
  vars     m1    m2
  <chr> <int> <int>
1 x         0     2
2 y         1     1
3 z         2     2

Count NAs between first and last occured numbers

Tags:

R

Dataframe

Missing Data

Na

Dplyr

Related

Recent Posts