IMDB to MySQL: Insert IMDB data into MySQL database
There is some nice py script, witch helped me. Just make connection and run it. ~1hr to work around everything.
EDIT: Use this readme file for making script.
Changes to IMDbPY and the IMDb data files format mean that the existing answers no longer work (as of January 2018).
I am using Ubuntu 17.10 and MariaDB 10.1 (not MySQL, but the following will also work with MySQL).
Changes to IMDbPY
The latest version of IMDbPY is 6.2, it is implemented in Python 3, and the dependencies on gcc
and SQLObject
have been removed. Also, the Python package MySQL-python
is not available for Python 3, so we install mysqlclient
instead; see below. (The API of mysqlclient
is compatible with MySQL-python
.)
Changes to the IMDb data files format
Changes to the format of the IMDb data files were introduced in December 2017, and IMDbPY 6.2 (the current version) does not yet work with the new file format. (See this GitHub issue.)
Until this is fixed, use the most recent version of the IMDd data published in the old format, which is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/. Download all *.list.gz
files (excluding files from subdirectories).
New steps to follow
Install Python 3 and required packages:
sudo apt install python3 pip3 install mysqlclient
In MariaDB, create a database
imdb
, and grant all privileges touser
with passwordpassword
.CREATE DATABASE imdb; GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password'; FLUSH PRIVILEGES;
Get IMDbPY 6.2:
wget https://github.com/alberanid/imdbpy/archive/6.2.zip unzip 6.2.zip cd imdbpy-6.2 python3 setup.py install
Load IMDb data into MariaDB:
cd bin python3 imdbpy2sql.py -d [imdb_dataset_directory] -u 'mysql://user:password@localhost/imdb'
Edit: Version 6.2 of IMDbPY does not create foreign keys. See this GitHub issue. You will need to use an older version of IMDbPY if you need foreign keys to be created, but there are also reported issues with the generation of foreign keys in old versions too (see linked GitHub issue).
Update: It took 4.5 hours to import, and I had no problems using InnoDB tables.
Edit: If wish to use version 6.2 of IMDbPY and require foreign keys, then you will need to add them manually to the database after it is generated. A very small amount of cleanup of the data is required before foreign keys can be added. This cleanup and the foreign keys that need to be added are described in this GitHub issue.