De-aggregate / reverse-summarise / expand a dataset in R
You can do this with list columns and a few dplyr
/tidyr
/purrr
verbs. It's not as compact as other base R solutions may be, but for me it's easier to understand how things fit together and it works in a larger tidyverse
pipeflow.
As a check first-off, we're expecting 2,201 rows in the final data frame:
library(dplyr)
library(tidyr)
library(purrr)
sum(df$Freq)
#> [1] 2201
Converting this to a tibble
makes it easier to see and work with the list column. I'm using purrr::map
to move along the Freq
column, creating a vector of a dummy marker the length of the Freq
value. In this case, that marker is just "1"; it could instead be TRUE
or anything else. The point it just that it will create a vector of the length Freq
.
df %>%
as_tibble() %>%
mutate(obs = map(Freq, ~rep_len(1, .x)))
#> # A tibble: 32 x 6
#> Class Sex Age Survived Freq obs
#> <fct> <fct> <fct> <fct> <dbl> <list>
#> 1 1st Male Child No 0 <dbl [0]>
#> 2 2nd Male Child No 0 <dbl [0]>
#> 3 3rd Male Child No 35 <dbl [35]>
#> 4 Crew Male Child No 0 <dbl [0]>
#> 5 1st Female Child No 0 <dbl [0]>
#> 6 2nd Female Child No 0 <dbl [0]>
#> 7 3rd Female Child No 17 <dbl [17]>
#> 8 Crew Female Child No 0 <dbl [0]>
#> 9 1st Male Adult No 118 <dbl [118]>
#> 10 2nd Male Adult No 154 <dbl [154]>
#> # … with 22 more rows
Then tidyr::unnest
creates a row for each element in that dummy vector. After that, I drop those last 2 columns to just have the important categories of class, sex, age, and survival.
df %>%
as_tibble() %>%
mutate(obs = map(Freq, ~rep_len(1, .x))) %>%
unnest() %>%
select(-Freq, -obs)
#> # A tibble: 2,201 x 4
#> Class Sex Age Survived
#> <fct> <fct> <fct> <fct>
#> 1 3rd Male Child No
#> 2 3rd Male Child No
#> 3 3rd Male Child No
#> 4 3rd Male Child No
#> 5 3rd Male Child No
#> 6 3rd Male Child No
#> 7 3rd Male Child No
#> 8 3rd Male Child No
#> 9 3rd Male Child No
#> 10 3rd Male Child No
#> # … with 2,191 more rows
In the end, it is in fact a data frame of 2,201 rows.
You can do this using the function untable
in reshape.
data("Titanic")
df <- as.data.frame(Titanic)
library(reshape)
newDf = untable(df[,1:4], num = df[,5])
Without packages we can repeat each row according to the frequencies given:
df2 <- df[rep(1:nrow(df), df[,5]),-5]