Handling very large data with mysql
Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.
Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being
nearly 68 years old, andon the verge ofno longer being supported.Best data store for billions of rows -- If you mean 'Engine', then InnoDB.
How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.
Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.
Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.
(for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes
having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this critical area.
partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog
MySQL Sharding -- Currently this is DIY
MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.
Partitioning does not support
FOREIGN KEY
or "global"UNIQUE
.UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.
Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative
CREATE TABLE
and how you intend to feed the data in.Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.
Do build summary tables
2,3 million transactions per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.
deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.
Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify
fuel_type
. There are multiple 1-byte approaches.
When collecting billions of rows, it is better (when possible) to consolidate, process, summarize, whatever, the data before storing. Keep the raw data in a file if you think you need to get back to it.
Doing that will eliminate most of your questions and concerns, plus speed up the processing.