Proper idiom for adding zero count rows in tidyr/dplyr
Since dplyr 0.8
you can do it by setting the parameter .drop = FALSE
in group_by
:
X.tidy <- X.raw %>% group_by(x, y, .drop = FALSE) %>% summarise(count=sum(z))
X.tidy
# # A tibble: 4 x 3
# # Groups: x [2]
# x y count
# <fct> <fct> <int>
# 1 A i 1
# 2 A ii 5
# 3 B i 15
# 4 B ii 0
This will keep groups made of all the levels of factor columns so if you have character columns you might want to convert them (thanks to Pate for the note).
You can use tidyr's expand
to make all combinations of levels of factors, and then left_join
:
X.tidy %>% expand(x, y) %>% left_join(X.tidy)
# Joining by: c("x", "y")
# Source: local data frame [4 x 3]
#
# x y count
# 1 A i 1
# 2 A ii 5
# 3 B i 15
# 4 B ii NA
Then you may keep values as NAs or replace them with 0 or any other value.
That way isn't a complete solution of the problem too, but it's faster and more RAM-friendly than spread
& gather
.
The complete
function from tidyr is made for just this situation.
From the docs:
This is a wrapper around expand(), left_join() and replace_na that's useful for completing missing combinations of data.
You could use it in two ways. First, you could use it on the original dataset before summarizing, "completing" the dataset with all combinations of x
and y
, and filling z
with 0 (you could use the default NA
fill
and use na.rm = TRUE
in sum
).
X.raw %>%
complete(x, y, fill = list(z = 0)) %>%
group_by(x,y) %>%
summarise(count = sum(z))
Source: local data frame [4 x 3]
Groups: x [?]
x y count
<fctr> <fctr> <dbl>
1 A i 1
2 A ii 5
3 B i 15
4 B ii 0
You can also use complete
on your pre-summarized dataset. Note that complete
respects grouping. X.tidy
is grouped, so you can either ungroup
and complete the dataset by x
and y
or just list the variable you want completed within each group - in this case, y
.
# Complete after ungrouping
X.tidy %>%
ungroup %>%
complete(x, y, fill = list(count = 0))
# Complete within grouping
X.tidy %>%
complete(y, fill = list(count = 0))
The result is the same for each option:
Source: local data frame [4 x 3]
x y count
<fctr> <fctr> <dbl>
1 A i 1
2 A ii 5
3 B i 15
4 B ii 0