Select longest continuous sequence
Your description results in a table definition like this:
CREATE TABLE tbl (
lap_id serial PRIMARY KEY
, lap_no int NOT NULL
, car_type enum NOT NULL
, race_id int NOT NULL -- REFERENCES ...
, UNIQUE(race_id, car_type, lap_no)
);
General solution for this class of problems
To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):
SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT *, count(*) FILTER (WHERE step)
OVER (ORDER BY race_id, car_type, lap_no) AS grp
FROM (
SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
IS DISTINCT FROM lap_no AS step
FROM tbl
) x
) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;
count(*) FILTER (WHERE step)
only counts TRUE
(= step to next group), which results in a new number for every new group.
Related question on SO, one answer featuring a procedural solution with plpgsql:
- GROUP BY and aggregate sequential numeric values
If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.
Faster for consecutive numbers
We can capitalize on the fact that consecutive lap_no
define a sequence, for a much simpler and faster version:
SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT race_id, car_type
, row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
FROM tbl
) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;
Consecutive laps end up in the same grp
. Every missing lap results in a lower grp
per partition.
This relies on (race_id, car_type, lap_no)
being UNIQUE NOT NULL
. NULL values or duplicates could break the logic.
Discussion of Jack's simpler alternative
@Jack's version effectively counts all laps (rows) where the previous lap_no
in this race_id
had the same car_type
. That's simpler and faster and correct - as long as each car_type
can only have one sequence per race_id
.
But for a task that simple the query could be simpler, yet. It would follow logically that all lap_no
per (car_type, race_id)
must be in sequence, and we could just count the laps:
SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;
If, on the other hand, one car_type
can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.
Faster for a given race / car type
In reply to the comment / clarifications in the question: restricting the query to one given (race_id, car_type)
will make it much faster, of course:
SELECT count(*) AS seq_len
FROM (
SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
FROM tbl
WHERE race_id = 1
AND car_type = 'red'
) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;
db<>fiddle here
Old SQL Fiddle
Index
Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:
CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);
If your table has the UNIQUE
constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.
create table tbl (lap_no int, car_type text, race_id int);
insert into tbl values (1,'red',1),(2,'red',1),(3,'red',1),(4,'red',1), (1,'blue',1),(5,'red',1),(2,'blue',1),(1,'green',1);
select car_type, race_id, sum(case when lap_no=(prev+1) then 1 else 0 end)+1 seq_len from ( select *, lag(lap_no) over (partition by car_type, race_id order by lap_no) prev from tbl ) z group by car_type, race_id order by seq_len desc limit 1;
/* |car_type|race_id|seq_len| |:-------|------:|------:| |red | 1| 5| */