R group by show count of all factor levels even when zero dplyr
We can convert 'ID' to factor
with levels
specified and just use table
table(factor(dat$ID, levels = letters))
Or using the same with tidyverse
library(tidyverse)
dat %>%
mutate(ID=factor(ID, levels = letters)) %>%
complete(ID) %>%
group_by(ID) %>%
summarise(no_rows = n())
In the accepted answer by akrun, table()
works, but the tidyverse
answer gives inaccurate counts (see below). Instead use the .drop = FALSE
option:
library(tidyverse)
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
dat %>%
mutate(ID = factor(ID, levels = letters)) %>%
count(ID, name = "no_rows", .drop = F) %>%
print.data.frame()
#> ID no_rows
#> 1 a 3
#> 2 b 2
#> 3 c 1
#> 4 d 1
#> 5 e 3
#> 6 f 3
#> 7 g 2
#> 8 h 1
#> 9 i 2
#> 10 j 5
#> 11 k 1
#> 12 l 3
#> 13 m 0
#> 14 n 3
#> 15 o 3
#> 16 p 0
#> 17 q 0
#> 18 r 1
#> 19 s 1
#> 20 t 3
#> 21 u 3
#> 22 v 1
#> 23 w 2
#> 24 x 0
#> 25 y 5
#> 26 z 1
Created on 2019-11-22 by the reprex package (v0.3.0)
Note that we expect nonzero counts for all letters but m, p, q, and x:
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
levels(dat$ID)
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "n" "o" "r" "s" "t"
#> [18] "u" "v" "w" "y" "z"
But if we use complete()
we get ones instead:
set.seed(1)
dat <- data.frame(ID = sample(letters,50,rep=TRUE))
dat %>%
mutate(ID=factor(ID, levels = letters)) %>%
complete(ID) %>%
group_by(ID) %>%
summarise(no_rows = n()) %>%
print.data.frame()
#> ID no_rows
# ...
#> 12 l 3
#> 13 m 1 # should be 0
#> 14 n 3
#> 15 o 3
#> 16 p 1 # should be 0
#> 17 q 1 # should be 0
#> 18 r 1
#> 19 s 1
#> 20 t 3
#> 21 u 3
#> 22 v 1
#> 23 w 2
#> 24 x 1 # should be 0
#> 25 y 5
#> 26 z 1
That's because complete()
actually adds a single m, p, q, and x to ID
so it contains at least one of each letter.