Movie Database, storing multiple genres
I would suggest you should follow the following structure:
tablename: movies
movieid, title, plot, rating, director
> sample data:
>
> 1 titanic Bollywood 10 James Cameron
tablename: genres
genreid, genre
> sample data:
> 1 Horror
> 2 Thriller
> 3 Action
> 4 Love
tablename: moviegenres
moviegenresid, movieid, genreid
> sample data:
> 1 1 2
> 2 1 4
And the query is:
select m.*,group_concat(g.genre)
from movies m inner join moviegenres mg
on m.movieid=mg.movieid
inner join genres g
on g.genreid=mg.genreid
group by m.movieid
;
See the fiddle
What you are looking to model here is called a "many-to-many" relationship, and is very common when modelling "real world" categorisations.
There are many descriptions out there of how to work with such relationships including:
- Praveen's answer here which is specific to your question.
- http://en.wikipedia.org/wiki/Junction_table - the extra table linking two populations in many/may relationships is usually called an intersection table or junction table.
- http://www.tomjewett.com/dbdesign/dbdesign.php?page=manymany.php which helpfully shows an example with the table and key/constraints design, a handy data representation diagram in case that isn't clear, and how the relationship is modelled and used in the application.
- Any good database design book/tutorial will cover this somewhere.
Do not be tempted to skip the extra intersection table by storing multiple genres in one field for each film (a comma separated list for instance). This is a very common "anti pattern" that will cause you problems, maybe not today, maybe not tomorrow, but eventually. I recommend anyone working with database design give Bill Karwin's "SQL Antipatterns" (http://pragprog.com/book/bksqla/sql-antipatterns) a read. It is written in a way that should be accessible to a relative beginner, but contains much that those of us who should know better need reminding of from time to time (many-to-many relations, the list-in-a-field solution/problem, and what you should do instead, are one of the first things the book covers).
This answer is an elaboration of my comment on @Praveen Prasannan's answer above.
I would eliminate the arbitrary surrogate keys movieID
and genreID
as a way to remove necessary overhead for the relational database. Since title
and genre
are natural unique keys, we should use them and not ask the database to maintain the uniqueness of extra, meaningless keys and tables (the genres
table in the referenced answer). This should improve the speed and performance for large relational databases and is good practice.
tablename: movies
primary key: title
title, plot, rating, director
> sample data:
> Titanic Bollywood 10 James Cameron
tablename: moviegenres
primary key: title, genre
title, genre
> sample data:
> Titanic Thriller
> Titanic Romance
This also makes queries a lot easier for both the user and the machine since you don't have to join an extra table to decode the genres by the arbitrary UID.