Which columns should be indexed when all may be used in different search queries?
Composite Primary Key
I would define the primary key as a composite key of (CinemaID, MovieID, Showtime)
.
These 3 columns uniquely identify each row, and so having a separate ID
column is not necessary.
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is (CinemaID, Showtime)
.
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with (CinemaID, MovieID, Showtime)
that is sorted by each column successively.
All your queries have CinemaID
present, which means you can quickly find the "section" of the spreadsheet for that CinemaID
. Then, for your query that searches by MovieID
, you can easily find the "subsection" in the 2nd column, where MovieID
matches the searched for value.
As the 3rd column of Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.
As for your other queries, they all start with CinemaID
and then use Showtime
in some manner. They also need the MovieID
in their results.
So, the (CinemaID, Showtime)
index has you covered there. Again, the CinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.
Even better, since your primary key includes MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
WHERE CinemaID = ? ORDER BY Showtime -- and
WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM]) -- need:
INDEX(CinemaID, Showtime)
WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime -- needs:
INDEX(CinemaID, MovieId, Showtime) -- or
INDEX(MovieId, CinemaID, Showtime)
Assuming that the the triple (MovieId, CinemaID, Showtime) is unique, I suggest getting rid of id
and having
PRIMARY KEY(CinemaID, MovieId, Showtime)
INDEX(CinemaID, Showtime)
Are there cases where the main part of the WHERE
is MovieID=...
?
Cinemas - A list of each city and its cinema (ID and name):
SELECT Cinema, CinemaID FROM Cimemas; -- (no index needed)
Movies - A list of movies that has been/will be shown at the cinema.
SELECT DISTINCT MovieID FROM ShowTimes WHERE CinemaID=...
INDEX(CinemaID, MovieID) -- already handled by my proposed PK
Showtimes - A list of all showtimes for all movies in all cities. -- This is a huge output; rethink the requirement for it. That is, think about what the Client will do with it.
Most of these indexes could be deduced from studying indexing cookbook and composite indexing
Note how I did not suggest indexes (single-column vs composite, plus specifics) until I saw the SELECTs
.
"columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?)" -- No. Selectivity is not the key for designing a composite index. Start with all the =
columns, in any order. (My Cookbook dwells on this topic.)
"index can only be used by the table if the first column is included in the search query" -- Mostly true. Notice how I recommended 2 indexes (remember: the PK is an index). There are cases where an index can be used for GROUP BY
or ORDER BY
, ignoring the WHERE
; but those are rare.
"first in the composite index ... Showtime" -- It is usually counterproductive to put a DATETIME
first in a composite index. In particular, your 3rd query can use both columns of INDEX(CinemaID, Showtime)
, but not both columns of INDEX(Showtime, CinemaID)
. It is easy to see this: Think of writing out two lists of cinemas and showtimes. Have one list sorted first on cinema (a la INDEX(cinema, time)
; have the other sorted on time. Think about which one would have all the rows for a particular cinema over a time range clumped ("clustered") together.
If there is also a ScreenID
due to two screens showing the same movie at the same time, tack it onto the PK. However, all the indexes will need rethinking.
(Sorry, Willem, I wrote my Answer before reading yours -- We say almost the same stuff.)