More than one path to JOIN the same table in Postgres
What my colleagues mean to say is the way you want to do it is not feasible, however, there are a myriad of ways to do the same thing.
What would you could do to avoid two joins is create a combined table of both SpecialEvents and SpecialPlaces containing all the information you want there and then JOIN that.
eg something like this:
SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
LEFT JOIN (
SELECT special_events.id AS special_event_id, special_places.id AS special_place_id, special_events.name
FROM special_places
LEFT JOIN special_events ON special_places.special_event_id = special_events.id
UNION
SELECT special_events.id AS special_event_id, null AS special_place_id, special_events.name
FROM special_events
) el1
ON (event_registrations.scoreable_id = el1.special_place_id AND event_registrations.scoreable_type = 'SpecialPlace') OR (event_registrations.scoreable_id = el1.special_event_id AND event_registrations.scoreable_type = 'SpecialEvent')
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at
Assuming that id
is the PRIMARY KEY
column in each of the given tables, and based on some educated guesses:
SELECT er.id
, t.name AS team_name -- can only be 1, no array_agg
, er.number_of_players
, er.state
, er.created_at
, tp.player_emails -- pre-aggregated!
, se.name AS special_event_name -- can only be 1, no array_agg
, sp.id AS special_pace_id -- can only be 1, no array_agg
FROM event_registrations er
LEFT JOIN teams t ON t.id = er.team_id
LEFT JOIN (
SELECT tm.team_id, array_agg(p.email) AS player_emails
FROM team_memberships tm
JOIN players p ON p.id = tm.player_id
GROUP BY 1
) tp USING (team_id)
LEFT JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
LEFT JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'
Much simpler and faster.
Major points
If you indeed need to join to the same table twice, you have to use table aliases like:
FROM event_registrations er
which is short for:
FROM event_registrations AS er
Turns out, you do not need to join to the same table twice. Still use table aliases to cut the noise. Related:
- How to join two tables with one of them not having a primary key and not the same character length
- Query to ORDER BY the number of rows returned from another SELECT
The only identifiable reason for the global
GROUP BY
in the outerSELECT
was the join toteam_memberships
that could potentially multiply rows. I moved the aggregation ofplayer_emails
to a much cheaper subquery, removed the outerGROUP BY
and simplified the rest. Should also be substantially faster. Related:- Why does the following join increase the query time significantly?
- Aggregate a single column in query with many columns
If you need
GROUP BY
in the outer query - andevent_registrations.id
is indeed thePRIMARY KEY
- then this:GROUP BY er.id, er.number_of_players, er.state, er.created_at
... is just another noisy way of saying:
GROUP BY er.id
Since Postgres 9.1, the PK covers all columns of a table in the
GROUP BY
clause. See:- PostgreSQL - GROUP BY clause
But you don't need that at all.
Finally, the core question is solved by joining to
special_places
conditionally first and then, conditionally again, joining tospecial_events
. Missing columns are filled in with NULL values:LEFT JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace' LEFT JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent' OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'
The final
AND er.scoreable_type = 'SpecialPlace'
is redundant, strictly speaking, as there can be nosp.special_event_id
otherwise anyway. I kept it for clarity.