Algorithm for fitting points to a grid
The best I could come up with is a brute-force solution that calculates the grid dimensions that minimize the error in the square of the Euclidean distance between the point and its nearest grid intersection.
This assumes that the number of points p is exactly equal to the number of columns times the number of rows, and that each grid intersection has exactly one point on it. It also assumes that the minimum x/y value for any point is zero. If the minimum is greater than zero, just subtract the minimum x value from each point's x coordinate and the minimum y value from each point's y coordinate.
The idea is to create all of the possible grid dimensions given the number of points. In the example above with 16 points, we would make grids with dimensions 1x16, 2x8, 4x4, 8x2 and 16x1. For each of these grids we calculate where the grid intersections would lie by dividing the maximum width of the points by the number of columns minus 1, and the maximum height of the points by the number of rows minus 1. Then we fit each point to its closest grid intersection and find the error (square of the distance) between the point and the intersection. (Note that this only works if each point is closer to its intended grid intersection than to any other intersection.)
After summing the errors for each grid configuration individually (e.g. getting one error value for the 1x16 configuration, another for the 2x8 configuration and so on), we select the configuration with the lowest error.
Initialization:
P is the set of points such that P[i][0] is the x-coordinate and
P[i][1] is the y-coordinate
Let p = |P| or the number of points in P
Let max_x = the maximum x-coordinate in P
Let max_y = the maximum y-coordinate in P
(minimum values are assumed to be zero)
Initialize min_error_dist = +infinity
Initialize min_error_cols = -1
Algorithm:
for (col_count = 1; col_count <= n; col_count++) {
// only compute for integer # of rows and cols
if ((p % col_count) == 0) {
row_count = n/col_count;
// Compute the width of the columns and height of the rows
// If the number of columns is 1, let the column width be max_x
// (and similarly for rows)
if (col_count > 1) col_width = max_x/(col_count-1);
else col_width=max_x;
if (row_count > 1) row_height = max_y/(row_count-1);
else row_height=max_y;
// reset the error for the new configuration
error_dist = 0.0;
for (i = 0; i < n; i++) {
// For the current point, normalize the x- and y-coordinates
// so that it's in the range 0..(col_count-1)
// and 0..(row_count-1)
normalized_x = P[i][0]/col_width;
normalized_y = P[i][1]/row_height;
// Error is the sum of the squares of the distances between
// the current point and the nearest grid point
// (in both the x and y direction)
error_dist += (normalized_x - round(normalized_x))^2 +
(normalized_y - round(normalized_y))^2;
}
if (error_dist < min_error_dist) {
min_error_dist = error_dist;
min_error_cols = col_count;
}
}
}
return min_error_cols;
Once you've got the number of columns (and thus the number of rows) you can recompute the normalized values for each point and round them to get the grid intersection they belong to.
A little bit of an image processing approach: If you think of what you have as a binary image where the X is 1 and the rest is 0, you can sum up rows and columns, and use a peak finding algorithm to identify peaks which would correspond to x and y lines of the grid:
Your points as a binary image:
Sums of row/columns
Now apply some smoothing technique to the signal (e.g. lowess):
I'm sure you get the idea :-)
Good luck
In the end I used this algorithm, inspired by beaker's:
- Calculate all the possible dimensions of the grid, given the total number of points
- For each possible dimension, fit the points to that dimension and calculate the variance in alignment:
- Order the points by x-value
- Group the points into columns: the first r points form the first column, where r is the number of rows
- Within each column, order the points by y-value to determine which row they're in
- For each row/column, calcuate the range of y-values/x-values
- The variance in alignment is the maximum range found
- Choose the dimension with the least variance in alignment