Efficiently counting the number of data points that fall within each square of a grid
Perhaps BinCounts
?
Example from the documentation with 1000 points:
Count random pairs in bins of width 0.25 in both dimensions:
BinCounts[RandomReal[{-1, 1}, {1000, 2}], {-1, 1, .25}, {-1, 1, .25}] (* {{8, 14, 21, 14, 15, 14, 18, 17}, {16, 15, 17, 19, 13, 12, 13, 12}, {19, 13, 33, 16, 9, 17, 14, 13}, {14, 18, 14, 8, 18, 11, 11, 9}, {24, 20, 17, 13, 21, 16, 16, 20}, {19, 23, 17, 14, 13, 14, 12, 21}, {17, 15, 14, 11, 14, 14, 18, 14}, {20, 15, 17, 10, 15, 19, 16, 16}} *)
10^6 points in a 100 x 100 grid:
BinCounts[RandomReal[{-1, 1}, {1000000, 2}], {-1, 1, .02}, {-1, 1, .02}] //
Dimensions // AbsoluteTiming
(*{0.09895, {100, 100}} *)
Not as short and quick as Michael E2's solution, but it collects the indices of the events for each grid cell seperately. That might get useful for later analysis.
n = 10;
xlist = Subdivide[0., 1., n];
ylist = Subdivide[0., 1., n];
centers =
Partition[
Tuples[{MovingAverage[xlist, 2], MovingAverage[ylist, 2]}], n];
pts = RandomReal[{0, 1}, {1000000, 2}];
data = Partition[Nearest[
pts -> Automatic, Flatten[centers, 1], {\[Infinity], 0.5/n},
DistanceFunction -> ChessboardDistance],
n
]; // AbsoluteTiming // First
0.210811
Now data[[i,j]]
contains the indices of all pts
that lie in the cell with center centers[[i,j]]
. Here a visualization:
i = 4;
j = 7;
Graphics[{
Darker@Green,
EdgeForm[{Darker@Darker@Green, Thickness[0.025]}],
Rectangle[{xlist[[i]], ylist[[j]]}, {xlist[[i]] + 1/n, ylist[[j]] + 1/n}],
Red,
PointSize[0.0001],
Point[pts[[data[[i, j]]]]]
},
PlotRange -> {{0, 1}, {0, 1}},
Frame -> True,
GridLines -> {xlist, ylist}
]
You can obtain the diagram you actually asked for by
MatrixPlot[Map[Length, data, {2}]]
Remark
Alternatively to using Nearest
, you could also use
data1 = BinLists[pts, {0, 1, 1/n}, {0, 1, 1/n}];
which is almost exactly as fast (and probably implemented precisely like the method above). Its disadvantage is that you don't get the indices but the points themselves, i.e., we have the relation
Sort[data1[[i, j]]] == Sort[pts[[data[[i, j]]]]]
True
Not getting the indices might make it needlessly complicated to gather metadata associated to the events contained in a cell.
If you only want counts on a regular grid, you can do a little better than BinCounts
. Using @Michael's example:
SeedRandom[1]
data = RandomReal[{-1, 1}, {10^6, 2}];
r1 = BinCounts[data, {-1, 1, .02}, {-1, 1, .02}]; // RepeatedTiming
r2 = Partition[
Values @ KeySort @ Counts[Floor[50 data] . {100, 1}],
100
]; //RepeatedTiming
r1 === r2
{0.048, Null}
{0.032, Null}
True