Determine length of sequences between columns or in string - and paste result

One dplyr and tidyr option could be:

df %>%
 rowid_to_column() %>%
 pivot_longer(-rowid) %>%
 group_by(rowid) %>%
 mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
                        NA_integer_,
                        with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
 pivot_wider(names_from = "name",
             values_from = "value")

  rowid    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1    NA     3     3     3    NA    NA     2     2    NA    NA
2     2    NA    NA    NA     4     4     4     4    NA     1    NA
3     3    NA    NA     2     2    NA     2     2    NA     1    NA
4     4    NA     6     6     6     6     6     6    NA    NA    NA
5     5    NA    NA    NA    NA     2     2    NA    NA    NA    NA
6     6    NA    NA    NA    NA     2     2    NA     1    NA    NA

Define a function recalc which acts on one row and then apply it to each row. recalc identifies the runs using rleid and then performs a calculation on each run passing the 1/0 value as the real part and the run number as the imaginary part of a complex vector to f. In f if the run contains 1 (real part) or it is the first run or is the last run (imaginary part) it is replaced with NA otherwise with the length. Finally recalc takes the real part.

library(data.table)

recalc <- function(x) {
  r <- rleid(x)
  f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
  Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))

giving this matrix:

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA  3  3  3 NA NA  2  2 NA  NA
[2,] NA NA NA  4  4  4  4 NA  1  NA
[3,] NA NA  2  2 NA  2  2 NA  1  NA
[4,] NA  6  6  6  6  6  6 NA NA  NA
[5,] NA NA NA NA  2  2 NA NA NA  NA
[6,] NA NA NA NA  2  2 NA  1 NA  NA

It may not be the best method to do so, but it'll work.

Adding tidyverse package for dplyr and tidyr functions. Additionally data.table package helps in rleid() function.

library(data.table)
library(tidyverse)

df %>% mutate(rowid = row_number()) %>%
  pivot_longer(-rowid) %>% group_by(rowid) %>%
  mutate(dummy = rleid(value),
         value = case_when(value == 1 ~ 0,
                           dummy == max(dummy) ~ 0,
                           dummy == min(dummy) ~ 0,
                           TRUE ~ 1)) %>%
  group_by(rowid, dummy) %>%
  mutate(value = if_else(value == 0, NA_real_, sum(value))) %>%
  ungroup() %>% select(-dummy) %>% 
  pivot_wider(id_cols = rowid) %>%
  select(-rowid)

# A tibble: 6 x 10
     V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    NA     3     3     3    NA    NA     2     2    NA    NA
2    NA    NA    NA     4     4     4     4    NA     1    NA
3    NA    NA     2     2    NA     2     2    NA     1    NA
4    NA     6     6     6     6     6     6    NA    NA    NA
5    NA    NA    NA    NA     2     2    NA    NA    NA    NA
6    NA    NA    NA    NA     2     2    NA     1    NA    NA

Determine length of sequences between columns or in string - and paste result

Tags:

R

Related

Recent Posts