Counting occurrences of numbers in a CUDA array
You can implement a histogram by first sorting the numbers, and then doing a keyed reduction.
The most straightforward method would be to use thrust::sort
and then thrust::reduce_by_key
. It's also often much faster than ad hoc binning based on atomics. Here's an example.