Redis as a database
As Redis is an in-memory storage, you cannot store large data that won't fit you machine's memory size. Redis usually work very bad when the data it stores is larger than 1/3 of the RAM size. So, this is the fatal limitation of using Redis as a database.
Certainly, you can distribute you big data into several Redis instances, but you have to do it all on your own manually. The operation usually be done like this(assuming you have only 1 instance from start):
- Use its master-slave mechanism to replicate data to the second machine, Now you have 2 copies of the same data.
- Cut off the connection between master and slave.
- Delete the first half(split by hashing, etc) of data on the first machine, and delete the second half of data on the second machine.
- Tell all clients(PHP, C, etc...) to operate on the first machine if the specified keys are on that machine, otherwise operate on the second machine.
This is the way how Redis scales! You also have to stop your service to prevent any writes during the migration.
To the expierence we encounter, we have this conclusion to Redis: Redis is not the right choice to store more than 30G data, Redis is not scalable, Redis is quite suitable for prototype development.
We later find an alternative to Redis, that is SSDB(https://github.com/ideawu/ssdb), a leveldb server that supports nearly all the APIs of Redis, it is suitable for storing more than 1TB of data, that only depends on the size of you harddisk.
I would like to share a few things that we have learned by using Redis as a primary Database in our service. We choose Redis since we had data that could not be partitioned. We wanted to get the best performance we could get out of one box
Pros:
- Redis was unbeatable in raw performance. We got 10K transactions per second out of the box (Note that one transaction involved multiple Redis commands). We were able to hit a rate of 25K+ transactions per second after a few optimizations, along with LUA scripts. So when it comes to performance per box, Redis is unmatched.
- Redis is very simple to setup and has a very small learning curve as opposed to other SQL and NoSQL datastores.
Cons:
- Redis supports only few primitive Data Structures like Hashes, Sets, Lists etc. and operations on these Data Structures. These are more than sufficient when you are using Redis as a cache, but if you want to use Redis as a full fledged primary data store, you will feel constrained. We had a tough time modelling our data requirements using these simple types.
- The biggest problem we have seen with Redis was the lack of flexibility. Once you have solutioned the structure of your data, any modifications to storage requirements or access patterns virtually requires re-thinking of the entire solution. Not sure if this is the case with all NoSQL data stores though (I have heard MongoDB is more flexible, but haven't used it myself)
- Since Redis is single threaded, CPU utilization is very low. You can't put multiple Redis instances on the same machine to improve CPU utilization as they will compete for the same disk, making disk as the bottleneck.
- Lack of horizontal scalability is a problem as mentioned by other answers.
You can use Redis as an authoritative store in a number of different ways:
Turn on AOF (Append-only File store) see AOF docs. This will keep a log of all Redis commands made against your dataset in real-time.
Run Redis using Master-Slave replication see replication docs. This will allow you to provide high-availability if one of your instances fails.
If you're running on something like EC2 you can EBS back your Redis partition to provide another layer of protection against instance failure.
On the horizon is Redis Cluster - this is specifically designed as a way to run Redis in a way that should help with HA and scalability. However, this won't appear for at least another six months or so.
Redis is an in-memory store which can also write the data back to disc. You can specify how many times to do a fsync to make redis safer(but also slower => trade-off) .
But still I am not certain if redis is in state yet to really store (mission) critical data in it (yet?). If for example it is not a huge problem when 1 more tweets(twitter.com) or something similiar get losts then I would certainly use redis. There is also a lot of information available about persistence at redis's own website.
You should also be aware of some persistence problems which could occur by reading antirez(redis maintainers) blog article. You should read his blog because he has some interesting articles.