More than one path to JOIN the same table in Postgres

What my colleagues mean to say is the way you want to do it is not feasible, however, there are a myriad of ways to do the same thing.

What would you could do to avoid two joins is create a combined table of both SpecialEvents and SpecialPlaces containing all the information you want there and then JOIN that.

eg something like this:

SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
LEFT JOIN (
SELECT special_events.id AS special_event_id, special_places.id AS special_place_id, special_events.name
FROM special_places
LEFT JOIN special_events ON special_places.special_event_id = special_events.id
UNION
SELECT special_events.id AS special_event_id, null AS special_place_id, special_events.name
FROM special_events
) el1
ON (event_registrations.scoreable_id = el1.special_place_id AND event_registrations.scoreable_type = 'SpecialPlace') OR (event_registrations.scoreable_id = el1.special_event_id AND event_registrations.scoreable_type = 'SpecialEvent')
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at

Assuming that id is the PRIMARY KEY column in each of the given tables, and based on some educated guesses:

SELECT er.id
     , t.name  AS team_name            -- can only be 1, no array_agg
     , er.number_of_players
     , er.state
     , er.created_at
     , tp.player_emails                -- pre-aggregated!
     , se.name AS special_event_name   -- can only be 1, no array_agg
     , sp.id   AS special_pace_id      -- can only be 1, no array_agg
FROM   event_registrations   er
LEFT   JOIN teams t ON t.id = er.team_id
LEFT   JOIN (
   SELECT tm.team_id, array_agg(p.email) AS player_emails
   FROM   team_memberships tm
   JOIN   players          p  ON p.id = tm.player_id
   GROUP  BY 1
   ) tp USING (team_id)
LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                              OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'

Much simpler and faster.

Major points

  • If you indeed need to join to the same table twice, you have to use table aliases like:

    FROM   event_registrations er

    which is short for:

    FROM   event_registrations AS er

    Turns out, you do not need to join to the same table twice. Still use table aliases to cut the noise. Related:

    • How to join two tables with one of them not having a primary key and not the same character length
    • Query to ORDER BY the number of rows returned from another SELECT
  • The only identifiable reason for the global GROUP BY in the outer SELECT was the join to team_memberships that could potentially multiply rows. I moved the aggregation of player_emails to a much cheaper subquery, removed the outer GROUP BY and simplified the rest. Should also be substantially faster. Related:

    • Why does the following join increase the query time significantly?
    • Aggregate a single column in query with many columns
  • If you need GROUP BY in the outer query - and event_registrations.id is indeed the PRIMARY KEY - then this:

    GROUP  BY er.id, er.number_of_players, er.state, er.created_at
    

    ... is just another noisy way of saying:

    GROUP  BY er.id
    

    Since Postgres 9.1, the PK covers all columns of a table in the GROUP BY clause. See:

    • PostgreSQL - GROUP BY clause

    But you don't need that at all.

  • Finally, the core question is solved by joining to special_places conditionally first and then, conditionally again, joining to special_events. Missing columns are filled in with NULL values:

    LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
    LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                                  OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'
    

    The final AND er.scoreable_type = 'SpecialPlace' is redundant, strictly speaking, as there can be no sp.special_event_id otherwise anyway. I kept it for clarity.