PostgreSQL: Generate a series of dates for each group in a table
1. CROSS JOIN
, LEFT JOIN LATERAL
to subquery
SELECT a.user_id, COALESCE(b.balance, 0) AS balance, d.as_of_date
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN LATERAL (
SELECT balance
FROM balances
WHERE user_id = a.user_id
AND as_of_date <= d.as_of_date
ORDER BY as_of_date DESC
LIMIT 1
) b ON true
ORDER BY a.user_id, d.as_of_date;
Returns your desired result - except that as_of_date
is an actual date
, not a timestamp
like in your example. That should be more appropriate.
Users that are created already, but don't have any transactions, yet, are listed with a balance of 0. You did not define how to deal with the corner case.
Rather use timestamp
input for generate_series()
:
- Generating time series between two dates in PostgreSQL
It's crucial for performance that you back this up with a multicolumn index:
CREATE INDEX balances_multi_idx ON balances (user_id, as_of_date DESC, balance);
We have had a very similar case on SO just this week:
- Aggregating the most recent joined records per week
Find more explanation there.
2. CROSS JOIN
, LEFT JOIN
, window functions
SELECT user_id
, COALESCE(max(balance) OVER (PARTITION BY user_id, grp
ORDER BY as_of_date), 0) AS balance
, as_of_date
FROM (
SELECT a.user_id, b.balance, d.as_of_date
, count(b.user_id) OVER (PARTITION BY user_id ORDER BY as_of_date) AS grp
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN balances b USING (user_id, as_of_date)
) sub
ORDER BY user_id, as_of_date;
Same result. If you have the multicolumn index mentioned above and can get index-only scans out of it, the first solution is most probably faster.
The main feature is the running count of values to form groups. since count() does not count NULL values, all dates without balance fall into the same group (grp
) as the most recent balance. Then use a simple max()
over the same window frame extended by grp
to copy the last balance for dangling gaps.
Related:
- Select longest continuous sequence
- How to get max sequence of rows?