Determine length of sequences between columns or in string - and paste result
One dplyr
and tidyr
option could be:
df %>%
rowid_to_column() %>%
pivot_longer(-rowid) %>%
group_by(rowid) %>%
mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
NA_integer_,
with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
pivot_wider(names_from = "name",
values_from = "value")
rowid V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 3 3 3 NA NA 2 2 NA NA
2 2 NA NA NA 4 4 4 4 NA 1 NA
3 3 NA NA 2 2 NA 2 2 NA 1 NA
4 4 NA 6 6 6 6 6 6 NA NA NA
5 5 NA NA NA NA 2 2 NA NA NA NA
6 6 NA NA NA NA 2 2 NA 1 NA NA
Define a function recalc which acts on one row and then apply it to each row. recalc identifies the runs using rleid and then performs a calculation on each run passing the 1/0 value as the real part and the run number as the imaginary part of a complex vector to f. In f if the run contains 1 (real part) or it is the first run or is the last run (imaginary part) it is replaced with NA otherwise with the length. Finally recalc takes the real part.
library(data.table)
recalc <- function(x) {
r <- rleid(x)
f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))
giving this matrix:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA 3 3 3 NA NA 2 2 NA NA
[2,] NA NA NA 4 4 4 4 NA 1 NA
[3,] NA NA 2 2 NA 2 2 NA 1 NA
[4,] NA 6 6 6 6 6 6 NA NA NA
[5,] NA NA NA NA 2 2 NA NA NA NA
[6,] NA NA NA NA 2 2 NA 1 NA NA
It may not be the best method to do so, but it'll work.
Adding tidyverse
package for dplyr
and tidyr
functions. Additionally data.table
package helps in rleid()
function.
library(data.table)
library(tidyverse)
df %>% mutate(rowid = row_number()) %>%
pivot_longer(-rowid) %>% group_by(rowid) %>%
mutate(dummy = rleid(value),
value = case_when(value == 1 ~ 0,
dummy == max(dummy) ~ 0,
dummy == min(dummy) ~ 0,
TRUE ~ 1)) %>%
group_by(rowid, dummy) %>%
mutate(value = if_else(value == 0, NA_real_, sum(value))) %>%
ungroup() %>% select(-dummy) %>%
pivot_wider(id_cols = rowid) %>%
select(-rowid)
# A tibble: 6 x 10
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 3 3 3 NA NA 2 2 NA NA
2 NA NA NA 4 4 4 4 NA 1 NA
3 NA NA 2 2 NA 2 2 NA 1 NA
4 NA 6 6 6 6 6 6 NA NA NA
5 NA NA NA NA 2 2 NA NA NA NA
6 NA NA NA NA 2 2 NA 1 NA NA