Query to obtain weighted percentile
Postgres has Ordered-Set Aggregate Functions for your purpose.
The special difficulty: you want rows "weighted" by cnt
. If that's supposed to mean that each row represents cnt
identical rows, you can multiply input rows by joining to generate_series(1, cnt)
:
SELECT site, dateval
, percentile_cont('{0,.25,.5,.75,1}'::float8[]) WITHIN GROUP (ORDER BY raw)
FROM source s, generate_series(1, s.cnt)
GROUP BY 1, 2;
db<>fiddle here
But results differ from your expected output (except for the 0 and 100 percentile). So you are "weighting" differently ...
Aside, your original query can be simplified to this equivalent:
SELECT site, dateval, raw, sum(cnt) AS sumcnt
, sum(sum(cnt)) OVER w AS cumsumcnt
, sum(sum(cnt)) OVER w / sum(sum(cnt)) OVER (PARTITION BY site, dateval) AS percentile
FROM source
GROUP BY site, dateval, raw
WINDOW w AS (PARTITION BY site, dateval ORDER BY raw);
You can run a window function over the result of an aggregate function in the same SELECT
(but not vice versa). See:
- Need separate columns that calculate percentage
I added a demo to the fiddle above.
But neither explains the odd numbers in your "expected results". Those strike me as incorrect, no matter how you interpolate. Example: 22.07
in the first line for p25
does not seem to make sense - the value 23
occupies all rows up to the 27.7879
percentile after factoring in cnt
according to your own query ...