Unnest or unchop dataframe containing lists of different lengths

Here is an idea via dplyr that you can generalise to as many columns as you want,

library(tidyverse)

df_AB_2 %>% 
 pivot_longer(c(A, B)) %>% 
 mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
 pivot_wider(names_from = name, values_from = value) %>% 
 unnest() %>% 
 filter(rowSums(is.na(.[-1])) != 2)

which gives,

Click to copy

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Defining a helper function to update the lengths of the element and proceeding with dplyr:

Click to copy

foo <- function(x, len_vec) {
  lapply(
    seq_len(length(x)), 
    function(i) {
      length(x[[i]]) <- len_vec[i]
      x[[i]]
    } 
  )
}

df_AB_2 %>% 
  mutate(maxl = pmax(lengths(A), lengths(B))) %>% 
  mutate(A = foo(A, maxl), B = foo(B, maxl)) %>% 
  unchop(cols = c(A, B)) %>% 
  select(-maxl)

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Using data.table:

Click to copy

library(data.table)
setDT(df_AB_2)
df_AB_2[, maxl := pmax(lengths(A), lengths(B))]
df_AB_2[, .(unlist(A)[seq_len(maxl)], unlist(B)[seq_len(maxl)]), by = ID]

Unnest or unchop dataframe containing lists of different lengths

Tags:

R

Tidyr

Unnest

Related

Recent Posts