Slow nested loop left join with index scan 130k times in loop
The execution plan looks pretty good in my opinion.
What you could try is to see if performance improves when a nested loop join is avoided.
To test this, run
SET enable_nestloop = off;
before your query and see if that improves the speed.
Other than that, I cannot think of any improvements.
Your query and explain output don't look so bad. Still, a couple of observations:
An index-only scan instead of an index scan on
matches_loosing_champion_ids_index
would be faster. The reason you don't see that: the uselesscount(match_id)
.5 bitmap index scans + BitmapOR step are pretty fast but a single bitmap index scan would be faster.
The most expensive part in this query plan is the
Nested Loop Left Join
. Might be different for other players.
With your schema
Query 1: LEFT JOIN LATERAL
This way, we aggregate before we join and don't need another GROUP BY
. Also fewer join operations. And count(*)
should unblock index-only scans:
SELECT player1, player2, player3, player4, player5
, ((win * 1.0) / (win + loss))::numeric(5,5) AS winrate
, win + loss AS matches
FROM (
SELECT winning_champion_one_id AS player1
, winning_champion_two_id AS player2
, winning_champion_three_id AS player3
, winning_champion_four_id AS player4
, winning_champion_five_id AS player5
, COUNT(*) AS win -- see below
FROM matches
WHERE 157 IN (winning_champion_one_id
, winning_champion_two_id
, winning_champion_three_id
, winning_champion_four_id
, winning_champion_five_id)
GROUP BY 1,2,3,4,5
) w
LEFT JOIN LATERAL (
SELECT COUNT(*) AS loss -- see below
FROM matches
WHERE loosing_champion_one_id = w.player1
AND loosing_champion_two_id = w.player2
AND loosing_champion_three_id = w.player3
AND loosing_champion_four_id = w.player4
AND loosing_champion_five_id = w.player5
GROUP BY loosing_champion_one_id
, loosing_champion_two_id
, loosing_champion_three_id
, loosing_champion_four_id
, loosing_champion_five_id
) l ON true
WHERE win + loss > 19
ORDER BY winrate DESC, matches DESC
LIMIT 1;
count(*)
:
is slightly shorter and faster in Postgres, doing the same as count(match_id)
here, because match_id
is never NULL.
Removing the only reference to match_id
allows an index-only scan on matches_loosing_champion_ids_index
! Some other preconditions must be met ...
Query 2: UNION ALL
Another way around the expensive Nested Loop Left Join
, and a single GROUP BY
. But we add 5 more bitmap index scans. May or may not be faster:
SELECT player1, player2, player3, player4, player5
,(count(win) * 1.0) / count(*) AS winrate -- I would round ...
, count(*) AS matches
FROM (
SELECT winning_champion_one_id AS player1
, winning_champion_two_id AS player2
, winning_champion_three_id AS player3
, winning_champion_four_id AS player4
, winning_champion_five_id AS player5
, TRUE AS win
FROM matches
WHERE 157 IN (winning_champion_one_id
, winning_champion_two_id
, winning_champion_three_id
, winning_champion_four_id
, winning_champion_five_id)
UNION ALL
SELECT loosing_champion_one_id
, loosing_champion_two_id
, loosing_champion_three_id
, loosing_champion_four_id
, loosing_champion_five_id
, NULL AS win -- following "count(win)" ignores NULL values
FROM matches
WHERE 157 IN (loosing_champion_one_id
, loosing_champion_two_id
, loosing_champion_three_id
, loosing_champion_four_id
, loosing_champion_five_id)
) m
GROUP BY 1,2,3,4,5
HAVING count(*) > 19 -- min 20 matches
-- AND count(win) > 0 -- min 1 win -- see below!
ORDER BY winrate DESC, matches DESC
LIMIT 1;
AND count(win) > 0
is commented out, because it's redundant, while you pick the single best winrate
anyways.
Different schema
I really would use a different schema to begin with:
CREATE TABLE team (
team_id serial PRIMARY KEY -- or bigserial if you expect > 2^31 distinct teams
-- more attributes?
);
CREATE TABLE player (
player_id smallserial PRIMARY KEY
-- more attributes?
);
CREATE TABLE team_member (
team_id integer REFERENCES team
, player_id smallint REFERENCES player
, team_pos smallint NOT NULL CHECK (team_pos BETWEEN 1 AND 5) -- only if position matters
, PRIMARY KEY (team_id, player_id)
, UNIQUE (team_id, team_pos)
);
CREATE INDEX team_member_player_team_idx on team_member (player_id, team_id);
-- Enforce 5 players per team. Various options, different question.
CREATE TABLE match (
match_id bigserial PRIMARY KEY
, winner integer NOT NULL REFERENCES team
, loser integer NOT NULL REFERENCES team
, CHECK (winner <> loser) -- wouldn't make sense
);
CREATE INDEX match_winner_loser_idx ON match (winner, loser);
CREATE INDEX match_loser_winner_idx ON match (loser, winner);
Subsidiary tables add to the disk footprint, but the main table is a bit smaller. And most importantly, you need fewer indexes, which should be substantially smaller overall for your cardinalities.
Query
We don't need any other indexes for this equivalent query. Much simpler and presumably faster now:
SELECT winner
, win * 1.0/ (win + loss) AS winrate
, win + loss AS matches
FROM (
SELECT w.winner, count(*) AS win
FROM team_member tm
JOIN match w ON w.winner = tm.team_id
WHERE tm.player_id = 157
GROUP BY w.winner
) w
LEFT JOIN LATERAL (
SELECT count(*) AS loss
FROM match l
WHERE l.loser = w.winner
GROUP BY l.loser
) l ON true
WHERE win + loss > 19
ORDER BY winrate DESC, matches DESC
LIMIT 1;
Join the result to team_member
to get individual players.
You can also try the corresponding UNION ALL
technique from above:
WITH t AS (
SELECT team_id
FROM team_member
WHERE player_id = 157 -- provide player here
)
SELECT team_id
,(count(win) * 1.0) / count(*) AS winrate
, count(*) AS matches
FROM (
SELECT t.team_id, TRUE AS win
FROM t JOIN match ON winner = t.team_id
UNION ALL
SELECT t.team_id, NULL AS win
FROM t JOIN match ON loser = t.team_id
) m
GROUP BY 1
HAVING count(*) > 19 -- min 20 matches
ORDER BY winrate DESC, matches DESC
LIMIT 1;
bloom
index
I briefly considered a bloom index for your predicate:
WHERE 157 IN (loosing_champion_one_id
, loosing_champion_two_id
, loosing_champion_three_id
, loosing_champion_four_id
, loosing_champion_five_id)
Didn't test, probably won't pay for just 5 smallint
columns.