PostGIS: Interpolate/Cluster geographic points to average points
A solution using PostGIS alone.
The Steps are as follows:
- Generate clusters from the points
- Get the centre of each point cluster
- Generate a line that connects each cluster centre point (will also approximate the centreline of the route)
- Create sampling points at 20m intervals along the line (allowing for 10m radius buffer from each sampling point)
- Get an average value for 10m buffer around each sampling point.
--1. Create the point clusters
DROP TABLE IF EXISTS points_clustered;
CREATE TABLE points_clustered AS
SELECT geom, st_clusterkmeans(geom, 10) over () AS cluster
FROM points;
--2. Get the centre of each point cluster
DROP TABLE IF EXISTS centers;
CREATE TABLE centers AS
SELECT cluster, ST_Centroid(ST_collect(geom)) AS geom
FROM points_clustered
GROUP BY cluster;
--3. Get centreline of connecting points
DROP TABLE IF EXISTS clustersline;
CREATE TABLE clustersline AS
SELECT st_makeline(geom) geom FROM
(SELECT geom AS geom FROM centers GROUP BY geom ORDER BY st_closestpoint(geom, geom)) AS f;
--4. Create a point on the line every 20 metres **(CHANGE THE EPSG)**
DROP TABLE IF EXISTS clustersline_20m;
CREATE TABLE clustersline_20m AS
WITH line AS
(SELECT
(ST_Dump(geom)).geom AS geom
FROM clustersline),
linemeasure AS
(SELECT
ST_AddMeasure(line.geom, 0, ST_Length(line.geom)) AS linem,
generate_series(0, ST_Length(line.geom)::int, 20) AS i
FROM line),
geometries AS (
SELECT
i,
0.0 avg,
(ST_Dump(ST_GeometryN(ST_LocateAlong(linem, i), 1))).geom AS geom
FROM linemeasure)
SELECT
i,
avg,
ST_SetSRID(ST_MakePoint(ST_X(geom), ST_Y(geom)),28355) AS geom
FROM geometries;
--5. Get an average value for 10m buffer of each sampling point.
UPDATE clustersline_20m
SET avg = subq.avg
FROM
(
SELECT id, AVG(p.value) avg
FROM points p, clustersline_20m a
WHERE ST_INTERSECTS(ST_BUFFER(a.geom,10), p.geom)
GROUP BY id
) AS subq
WHERE id = subq.id;
Credit to these questions which helped build the answer:
http://blog.cleverelephant.ca/2018/06/polygon-splitting.html
PostGIS or QGIS: Convert unsorted points to a single line by connecting each 2 closest points
How can I transform polylines into points every n metres in PostGIS?
The solution presented here consists basicaly of the following steps:
- Project the points to the line layer
- Create a new point layer with points at a regular interval along the line
- For each point of step 2, find the closest point from step 1: this one is the "cluster center": the point that remains and that gets the mean value of the other points in a certain distance (those that are deleted afterwards)
- For each cluster center (from step 3) create a buffer and get the mean value for a certain attribute (like speed) of all projected points (from step 1) that fall within the buffer.
So this is how you can do it using QGIS:
- Project the points to the line, using an expression like the following on the point layer - us
Menu Processing / Toolbox / Geometry by expression
to create te points as actual geometries - attributes will be kept:
closest_point (
collect_geometries (
overlay_nearest (
'strecken_polyline',
$geometry
)
),
$geometry
)
Menu Processing / Toolbox / Points along geometry
: create regular points along the line - select a distance that fits your date and needs. On the next screenshot, the blue dots are the points created here:
- For each blue dot, find the closest white dot: this will be used as cluster center (thus the points that will be kept and that get the mean value off all other points). This is the expression to use with
Menu Processing / Toolbox / Geometry by expression
, whereprojected_points
is the name of the point layer created in step 1:
collect_geometries (
overlay_nearest(
'projected_points',
$geometry,
limit:=1
)
)
In the next screenshot, the cluster center is marked by red arows:
- Now create a buffer around each cluster center (result from step 3). For the size of the buffer, for demonstration purpose I selected a value of 50 [meters] - use whatever distance fits your data and change the value on line 7 of the following expression. Than, for each point that falls inside this buffer, calculate the mean value for an attribute (here: the field named
speed
). You can do this by opening the attribute table of the layer with the cluster center (result from step 3) and use the field calculator. Introduce this expression to create a new field with the mean value:
aggregate(
'projected_points',
'mean',
speed,
intersects (
$geometry,
buffer (geometry (@parent), 50)
)
)
See the result on the next screenshot: the original points (red dots) are reduced to the white squares. Compare the values of the labels that represent the speed field (I used random values from 80 to 220, just to see the effect): the small values nearby the red dots is the speed values in the original data, the bold values is the calculated mean for the points that fall inside the blue buffer (visualized here for better understanding, but it is not necessary to actually create these buffers):
Remarks: as you can see, depending on how far away the nearest projected point (step 1) is from the regular points (step 2), the buffers overlap or have gaps in between - so in some cases not all points are "catched" or some points are "catched" twice. I guess in your data, you have much more points and they are closer together, so this problem should not be too big.
However, there is a possibility to make sure that a) all points are taken into consideration and b) every point is taken into consideration just once. If you would like to do it that way, it's a bit more complicated, but not too much - just leave a comment so that I can add this to the solution.
This here is the first variant of the solution - I keep it here as it still might be helpful:
If you have a line-layer (railway, from OpenStreetMap), you can do the following steps. In principle, it works also without an additional line layer, just with the points (simply skip step 1), but than you only have a mean value for each point, based on all point in a certain distance (that you set in the step numbered with 2 below). To create a line from these points would be another question that should be asked separately.
Create a buffer around each point with a distance that fits your data.
Create a new field on the buffer layer that counts the mean of all values of a particular attribute field of those points that fall inside the buffer. Use this expression (adapted to to filed/layernames you use - I used
projected_points
for the layer from step 1, andvalue
for an attribute in the original points layer with random values - your field is the one containing the speed information):
aggregate(
'projected_points',
'mean',
value,
intersects (
$geometry,
geometry (@parent)
)
)
See screenshot, where the expression is used for labeling the buffer:
- If you want, using a table join, you can join the value calculated in the last step to your original point layer.