Do SSDs reduce the usefulness of Databases
There are some things in a database that should be tweaked when you use SSDs. For instance, speaking for PostgreSQL you can adjust effective_io_concurrency
, and random_page_cost
. However, faster reads and faster random access isn't what a database does. It ensures
- ACID (Atomicity, Consistency, Isolation, Durability)
- Some form of concurrency control, MVCC (Multiversion concurrency control)
- Standardized access for libraries (XQuery, or SQL)
He's just wrong about indexes. If the whole table can be read into ram, an index is still useful. Don't believe me? Let's do a thought experiment,
Imagine you have a table with one indexed column.
CREATE TABLE foobar ( id text PRIMARY KEY );
Imagine that there are 500 million rows in that table.
- Imagine all 500 million rows are concatenated together into a file.
What's faster,
grep 'keyword' file
SELECT * FROM foobar WHERE id = 'keyword'
It's not just about where data is at, it's about how you order it and what operations you can do it. PostgreSQL supports B-tree, Hash, GiST, SP-GiST, GIN and BRIN indexes (and Bloom through an extension). You'd be foolish to think that all of that math and functionality goes away because you have faster random access.
Based on your post, it appears the clear message is that RDBMS lookup time optimizations are being replaced with hardware which makes IO time negligible.
This is absolutely true. SSD on database servers combined with high (actual) RAM makes IO waiting significantly shorter. However, RDBMS indexing and caching is still of value because even systems with this huge IO boon can and will have IO bottlenecks from poorly performing queries caused by bad indexing. This is typically only found under high workload applications or poorly written applications.
The key value to RDBMS systems in general is data consistency, data availability, and data aggregation. Utilizing an excel spreadsheet, csv file, or other method of keeping a "data base" yields no guarantees.
SSD doesn't protect you from your primary server become unavailable for any reason (network, OS corruption, power loss). SSD doesn't protect you from a bad data modification. SSD doesn't make it faster to run analytics compared to "just having" them.
Uncle Bob probably was talking about in-memory databases such as Redis or Gemfire. In these databases, everything in the database really is contained in RAM. The database could start out empty and be filed with short-lived data (being used as a cache) or it start by loading everything in from disk and periodically checkpoint changes to disk.
This is becoming more and more popular because RAM is getting cheap, and it becomes feasible to have a terabyte of data stored in an in-memory clustered database. There are a lot of use cases where the speed from having instant access to things makes it valuable to put in RAM rather than even a fast disk like SSD. You can even continue using SQL for some of these if it makes sense.
Why should this worry Oracle? Data is growing and it's unlikely that RDBMSes will go away. However, a lot of Oracle's engineering time over the years has gone into ways to make data retrieval on spinning disks really fast. Oracle will need to adapt to a completely different storage tier. They are, with Oracle Database In Memory, but they're exposed to different competition than in the past. Think of how much time has gone into making sure the query optimizer chooses the right strategies based on the layout of things on disk....