Query to obtain weighted percentile

Postgres has Ordered-Set Aggregate Functions for your purpose.

The special difficulty: you want rows "weighted" by cnt. If that's supposed to mean that each row represents cnt identical rows, you can multiply input rows by joining to generate_series(1, cnt):

Click to copy

SELECT site, dateval
     , percentile_cont('{0,.25,.5,.75,1}'::float8[]) WITHIN GROUP (ORDER BY raw)
FROM   source s, generate_series(1, s.cnt)
GROUP  BY 1, 2;

db<>fiddle here

But results differ from your expected output (except for the 0 and 100 percentile). So you are "weighting" differently ...

Aside, your original query can be simplified to this equivalent:

Click to copy

SELECT site, dateval, raw, sum(cnt) AS sumcnt
     , sum(sum(cnt)) OVER w AS cumsumcnt
     , sum(sum(cnt)) OVER w / sum(sum(cnt)) OVER (PARTITION BY site, dateval) AS percentile 
FROM   source
GROUP  BY site, dateval, raw
WINDOW w AS (PARTITION BY site, dateval ORDER BY raw);

You can run a window function over the result of an aggregate function in the same SELECT (but not vice versa). See:

Need separate columns that calculate percentage

I added a demo to the fiddle above.

But neither explains the odd numbers in your "expected results". Those strike me as incorrect, no matter how you interpolate. Example: 22.07 in the first line for p25 does not seem to make sense - the value 23 occupies all rows up to the 27.7879 percentile after factoring in cnt according to your own query ...

Query to obtain weighted percentile

Tags:

Postgresql

Aggregate

Postgresql 10

Related

Recent Posts