How to speed up queries on a large 220 million rows table (9 gig data)?
Thoughts on the issue, thrown in random order:
The obvious index for this query is:
(rated_user_id, rating)
. A query that gets data for only one of the million users and needs 17 seconds is doing something wrong: reading from the(rated_user_id, rater_user_id)
index and then reading from the table the (hundreds to thousands) values for therating
column, asrating
is not in any index. So, the query has to read many rows of the table which are located in many different disk locations.Before starting adding numerous indexes in the tables, try to analyze the performance of the whole database, the whole set of slow queries, examine again the choices of the datatypes, the engine you use and the configuration settings.
Consider moving to a newer version of MySQL, 5.1, 5.5 or even 5.6 (also: Percona and MariaDB versions.) Several benefits as bugs have been corrected, the optimizer improved and you can set the low threshold for slow queries to less than 1 second (like 10 milliseconds). This will give you far better info about slow queries.
The choice for the datatype of
rating
is weird.VARCHAR(1)
? Why notCHAR(1)
? Why notTINYINT
? This will save you some space, both tin the table and in the indexes that (will) include that column. A varchar(1) column needs one more byte over char(1) and if they are utf8, the (var)char columns will need 3 (or 4) bytes, instead of 1 (tinyint).