Better database for "keep always the 5 latest entries per ID and delete older"?
One option I can think of is to delete the oldest row as soon as you insert a new one. That will only require a very quick lookup limited to at most 6 rows rather then going through all rows at a time.
To do that efficiently you need a unique key on the table:
create table history
(
id serial primary key, -- to make a lookup on a single row efficient
fooid text not null,
target text not null,
updated_at timestamp not null default now()
);
-- to make finding the oldest row for one fooid efficient
create index on history(fooid, updated_at);
Then create a trigger that only keeps the 5 most recent rows for a fooid
:
create or replace function remove_last()
returns trigger
as
$$
begin
with ranked as (
select id, row_number() over (partition by fooid order by updated_at) as rn
from history
where id <> new.id
and fooid = new.fooid
)
delete from history
where id in (select id
from ranked
where rn >= 5);
return new;
end;
$$
language plpgsql;
create trigger remove_last_trigger
after insert on history
for each row execute procedure remove_last();
On my laptop, a simple test setup with 5 million rows (1 million different fooid values) showed a trigger overhead that was less then a millisecond:
insert into history (fooid, target) values ('1', 'new stuff');
QUERY PLAN ---------------------------------------------------------------------------------------------------- Insert on stuff.history (cost=0.00..0.01 rows=1 width=76) (actual time=0.062..0.062 rows=0 loops=1) Buffers: shared hit=8 -> Result (cost=0.00..0.01 rows=1 width=76) (actual time=0.017..0.017 rows=1 loops=1) Output: nextval('history_id_seq'::regclass), '1'::text, 'new stuff'::text, now() Buffers: shared hit=1 Planning time: 0.024 ms Trigger remove_last_trigger: time=0.438 calls=1 Execution time: 0.524 ms
How much overhead that is in your system and whether or not that is acceptable for you I don't know, but "thousand changes per day" doesn't sound like a very busy system.