Grouping or Window
1. Window functions plus subqueries
Count the steps to form groups, similar to Evan's idea, with modifications and fixes:
SELECT id_type
, min(date) AS begin
, max(date) AS end
, count(*) AS row_ct -- optional addition
FROM (
SELECT date, id_type, count(step OR NULL) OVER (ORDER BY date) AS grp
FROM (
SELECT date, id_type
, lag(id_type, 1, id_type) OVER (ORDER BY date) <> id_type AS step
FROM tmp
) sub1
) sub2
GROUP BY id_type, grp
ORDER BY min(date);
This assumes involved columns are NOT NULL
. Else you need to do more.
Also assuming date
to be defined UNIQUE
, else you need to add a tiebreaker to the ORDER BY
clauses get deterministic results. Like: ORDER BY date, id
.
Detailed explanation (answer to very similar question):
- Select longest continuous sequence
Note in particular:
In related cases,
lag()
with 3 parameters can be essential to cover the corner case of the first (or last) row elegantly. (The 3rd param is used as default if there is no previous (next) row.lag(id_type, 1, id_type) OVER ()
Since we are only interested in an actual change of
id_type
(TRUE
), it does not matter in this particular case.NULL
andFALSE
both don't count asstep
.count(step OR NULL) OVER (ORDER BY date)
is the shortest syntax that also works in Postgres 9.3 or older.count()
only counts non-null values ...In modern Postgres, the cleaner, equivalent syntax would be:
count(step) FILTER (WHERE step) OVER (ORDER BY date)
Details:
- For absolute performance, is SUM faster or COUNT?
2. Subtract two window functions, one subquery
Similar to Erik's idea with modifications:
SELECT min(date) AS begin
, max(date) AS end
, id_type
FROM (
SELECT date, id_type
, row_number() OVER (ORDER BY date)
- row_number() OVER (PARTITION BY id_type ORDER BY date) AS grp
FROM tmp
) sub
GROUP BY id_type, grp
ORDER BY min(date);
If date
is defined UNIQUE
, like I mention above (you never clarified), dense_rank()
would be pointless, since the result is the same as for row_number()
and the latter is substantially cheaper.
If date
is not defined UNIQUE
(and we don't know that the only duplicates are on (date, id_type)
), all of these queries are pointless, since the result is arbitrary.
Also, a subquery is typically cheaper than a CTE in Postgres. Only use CTEs when you need them.
Related answers with more explanation:
- GROUP BY and aggregate sequential numeric values
- Group by repeating attribute
- GROUP BY uninterrupted sequence of logs for same location
In related cases where we already have a running number in the table, we can make do with a single window function:
- Rank based on sequence of dates
3. Top performance with plpgsql function
Since this question has become unexpectedly popular, I'll add another solution to demonstrate top performance.
SQL has many sophisticated tools to create solutions with short and elegant syntax. But a declarative language has its limits for more complex requirements that involve procedural elements.
A server-side procedural function is faster for this than anything posted so far because it only needs a single sequential scan over the table and a single sort operation. If a fitting index is available, even just a single index-only scan.
CREATE OR REPLACE FUNCTION f_tmp_groups()
RETURNS TABLE (id_type int, grp_begin timestamp, grp_end timestamp) AS
$func$
DECLARE
_row tmp; -- use table type for row variable
BEGIN
FOR _row IN
TABLE tmp ORDER BY date -- add more columns to make order deterministic
LOOP
CASE _row.id_type = id_type
WHEN TRUE THEN -- same group continues
grp_end := _row.date; -- remember last date so far
WHEN FALSE THEN -- next group starts
RETURN NEXT; -- return result for last group
id_type := _row.id_type;
grp_begin := _row.date;
grp_end := _row.date;
ELSE -- NULL for 1st row
id_type := _row.id_type; -- remember row data for starters
grp_begin := _row.date;
grp_end := _row.date;
END CASE;
END LOOP;
RETURN NEXT; -- return last result row
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_tmp_groups();
Test with:
EXPLAIN (ANALYZE, TIMING OFF) -- to focus on total performance
SELECT * FROM f_tmp_groups();
You could make the function generic with polymorphic types and pass table type and column names. Details:
- Refactor a PL/pgSQL function to return the output of various SELECT queries
If you don't want to or cannot persist a function for this, it would even pay to create a temporary function on the fly. Costs a few ms.
- How to create a temporary function in PostgreSQL?
dbfiddle for Postgres 9.6, comparing performance for all three.Building on Jack's test case, modified.
dbfiddle for Postgres 8.4, where performance differences are even bigger.
You can do this as a simple subtraction of ROW_NUMBER()
operations (or if your dates are not unique, though still unique per id_type
, then you can use DENSE_RANK()
instead, though it will be a more expensive query):
WITH IdTypes AS (
SELECT
date,
id_type,
Row_Number() OVER (ORDER BY date)
- Row_Number() OVER (PARTITION BY id_type ORDER BY date)
AS Seq
FROM
tmp
)
SELECT
Min(date) AS begin,
Max(date) AS end,
id_type
FROM IdTypes
GROUP BY id_type, Seq
ORDER BY begin
;
See this work at DB Fiddle (or see the DENSE_RANK version)
Result:
begin end id_type
--------------------- --------------------- -------
2017-01-10 07:19:21 2017-01-10 07:19:25 3
2017-01-10 07:19:26 2017-01-10 07:19:26 5
2017-01-10 07:19:27.1 2017-01-10 07:19:27.1 3
2017-01-10 07:19:28 2017-01-10 07:19:29 5
2017-01-10 07:19:30.1 2017-01-10 07:19:30.1 3
2017-01-10 07:19:31 2017-01-10 07:19:31 5
2017-01-10 07:19:32 2017-01-10 07:19:32 3
2017-01-10 07:19:33.1 2017-01-10 07:19:37.1 5
Logically, you can think of this as a simple DENSE_RANK()
with a PREORDER BY
, that is, you want the DENSE_RANK
of all the items that are ranked together, and you want them ordered by the dates, you just have to deal with the pesky problem of the fact that at each change in the date, DENSE_RANK
will increment. You do that by using the expression as I showed you above. Imagine if you had this syntax: DENSE_RANK() OVER (PREORDER BY date, ORDER BY id_type)
where the PREORDER
is excluded from the ranking calculation and only the ORDER BY
is counted.
Note that it's important to GROUP BY
both the generated Seq
column as well as the id_type
column. Seq
is NOT unique by itself, there can be overlaps--you must also group by id_type
.
For further reading on this topic:
- Detect changes between row values—read the See It For Yourself section.
- Or this simpler explanation
That first link gives you some code you can use if you wanted the begin or end date to be the same as the previous or next period's end/begin date (so there are no gaps). Plus other versions that could assist you in your query. Though they have to be translated from SQL Server syntax...
On Postgres 8.4 you can use a RECURSIVE function.
How do they do it
The recursive function adds a level to each different id_type, by selecting dates one by one on descending order.
date | id_type | lv
--------------------------------------
2017-01-10 07:19:21.0 3 8
2017-01-10 07:19:22.0 3 8
2017-01-10 07:19:23.1 3 8
2017-01-10 07:19:24.1 3 8
2017-01-10 07:19:25.0 3 8
2017-01-10 07:19:26.0 5 7
2017-01-10 07:19:27.1 3 6
2017-01-10 07:19:28.0 5 5
2017-01-10 07:19:29.0 5 5
2017-01-10 07:19:30.1 3 4
2017-01-10 07:19:31.0 5 3
2017-01-10 07:19:32.0 3 2
2017-01-10 07:19:33.1 5 1
2017-01-10 07:19:35.0 5 1
2017-01-10 07:19:36.1 5 1
2017-01-10 07:19:37.1 5 1
Then use MAX(date), MIN(date) grouping by level, id_type to get the desired result.
with RECURSIVE rdates as
(
(select date, id_type, 1 lv
from yourTable
order by date desc
limit 1
)
union
(select d.date, d.id_type,
case when r.id_type = d.id_type
then r.lv
else r.lv + 1
end lv
from yourTable d
inner join rdates r
on d.date < r.date
order by date desc
limit 1)
)
select min(date) StartDate,
max(date) EndDate,
id_type
from rdates
group by lv, id_type
;
+---------------------+---------------------+---------+
| startdate | enddate | id_type |
+---------------------+---------------------+---------+
| 10.01.2017 07:19:21 | 10.01.2017 07:19:25 | 3 |
| 10.01.2017 07:19:26 | 10.01.2017 07:19:26 | 5 |
| 10.01.2017 07:19:27 | 10.01.2017 07:19:27 | 3 |
| 10.01.2017 07:19:28 | 10.01.2017 07:19:29 | 5 |
| 10.01.2017 07:19:30 | 10.01.2017 07:19:30 | 3 |
| 10.01.2017 07:19:31 | 10.01.2017 07:19:31 | 5 |
| 10.01.2017 07:19:32 | 10.01.2017 07:19:32 | 3 |
| 10.01.2017 07:19:33 | 10.01.2017 07:19:37 | 5 |
+---------------------+---------------------+---------+
Check it: http://rextester.com/WCOYFP6623