IMDB to MySQL: Insert IMDB data into MySQL database

There is some nice py script, witch helped me. Just make connection and run it. ~1hr to work around everything.

EDIT: Use this readme file for making script.


Changes to IMDbPY and the IMDb data files format mean that the existing answers no longer work (as of January 2018).

I am using Ubuntu 17.10 and MariaDB 10.1 (not MySQL, but the following will also work with MySQL).

Changes to IMDbPY

The latest version of IMDbPY is 6.2, it is implemented in Python 3, and the dependencies on gcc and SQLObject have been removed. Also, the Python package MySQL-python is not available for Python 3, so we install mysqlclient instead; see below. (The API of mysqlclient is compatible with MySQL-python.)

Changes to the IMDb data files format

Changes to the format of the IMDb data files were introduced in December 2017, and IMDbPY 6.2 (the current version) does not yet work with the new file format. (See this GitHub issue.)

Until this is fixed, use the most recent version of the IMDd data published in the old format, which is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/. Download all *.list.gz files (excluding files from subdirectories).

New steps to follow

  1. Install Python 3 and required packages:

    sudo apt install python3
    pip3 install mysqlclient
    
  2. In MariaDB, create a database imdb, and grant all privileges to user with password password.

    CREATE DATABASE imdb;
    GRANT ALL PRIVILEGES ON imdb.* TO 'user'@'localhost' IDENTIFIED BY 'password';
    FLUSH PRIVILEGES;
    
  3. Get IMDbPY 6.2:

    wget https://github.com/alberanid/imdbpy/archive/6.2.zip
    unzip 6.2.zip
    cd imdbpy-6.2
    python3 setup.py install
    
  4. Load IMDb data into MariaDB:

    cd bin
    python3 imdbpy2sql.py -d [imdb_dataset_directory] -u 'mysql://user:password@localhost/imdb'
    

Edit: Version 6.2 of IMDbPY does not create foreign keys. See this GitHub issue. You will need to use an older version of IMDbPY if you need foreign keys to be created, but there are also reported issues with the generation of foreign keys in old versions too (see linked GitHub issue).

Update: It took 4.5 hours to import, and I had no problems using InnoDB tables.

Edit: If wish to use version 6.2 of IMDbPY and require foreign keys, then you will need to add them manually to the database after it is generated. A very small amount of cleanup of the data is required before foreign keys can be added. This cleanup and the foreign keys that need to be added are described in this GitHub issue.