Fastest way to count occurrences of each unique element

This is a little slower than tabulate, but is more universal (it will work with characters, factors, basically whatever you throw at it) and much easier to read/maintain/expand.

library(data.table)

f6 = function(x) {
  data.table(x)[, .N, keyby = x]
}

x <- sample(1:1000, size=1e7, TRUE)
system.time(f6(x))
#   user  system elapsed 
#   0.80    0.07    0.86 

system.time(f8(x)) # tabulate + dickoa's conversion to data.frame
#   user  system elapsed 
#   0.56    0.04    0.60

UPDATE: As of data.table version 1.9.3, the data.table version is actually about 2x faster than tabulate + data.frame conversion.

There's almost nothing that will beat tabulate() provided you can meet the initial conditions.

x <- sample(1:100, size=1e7, TRUE)
system.time(tabulate(x))
#  user  system elapsed 
# 0.071   0.000   0.072

@dickoa adds a few more notes in the comments as to how to get the appropriate output, but tabulate as a workhorse function is the way to go.

Fastest way to count occurrences of each unique element

Tags:

Performance

R

Aggregate

Related

Recent Posts