Postgis - Zonal statistics - Parallel computation?

i would add a intersects join clause on this query like this

DROP TABLE IF EXISTS schema.zonal_stats;
CREATE TABLE schema.zonal_stats AS 
SELECT id,geom,
(ST_SummaryStats(ST_Union(ST_Clip(raster,geom)))).sum 
FROM schema.raster join schema.ply on st_intersects(raster,geom)
GROUP BY id,geom;

this will run more efficiently because it will use an index and only clip the raster sections that intersect the vector.

In terms of the parrallel processing I am not well versed in what happens under the hood but it seems like you followed what the docs and other tutorials have said about setting the parralel workers etc..

Based on your comments that you pre-clipped the raster to the vector I would not even use the geometry in the query

I would try

SELECT id,rast,
    (ST_SummaryStats(rast)).sum 
    FROM schema.raster

or if you are hell bent on the st_union

SELECT id,rast,
    (ST_SummaryStats(st_union(rast))).sum 
    FROM schema.raster
group by id,rast

This doesn't answer the question 100% but does answer my problem, which is to speed up zonal statistics.

I am using the following exactextract command line tool for fast zonal statistics. This also takes into account partial overlaps, by weighting the zonal statistic for each raster value by its coverage fraction. Many zonal statistic tools do not account for this.

If someone can produce an example of multithreaded zonal statistics in postgis, I will substitute this as the marked answer with another.

Postgis - Zonal statistics - Parallel computation?

Tags:

Statistics

Postgis

Parallel

Zonal Statistics

Related

Recent Posts