alternative to memcached that can persist to disk
EhCache has a "disk persistent" mode which dumps the cache contents to disk on shutdown, and will reinstate the data when started back up again. As for your other requirements, when running in distributed mode it replicates the data across all nodes, rather than storing them on just one. other than that, it should fit your needs nicely. It's also still under active development, which many other java caching frameworks are not.
Try go-memcached - memcache server written in Go. It persists cached data to disk out of the box. Go-memcached is compatible with memcache clients. It has the following features missing in the original memcached:
- Cached data survive server crashes and/or restarts.
- Cache size may exceed available RAM size by multiple orders of magnitude.
- There is no 250 byte limit on key size.
- There is no 1Mb limit on value size. Value size is actually limited by 2Gb.
- It is faster than the original memcached. It also uses less CPU when serving incoming requests.
Here are performance numbers obtained via go-memcached-bench:
-----------------------------------------------------
| | go-memcached | original memcached |
| | v1 | v1.4.13 |
| workerMode ----------------------------------------
| | Kqps | cpu time | Kqps | cpu time |
|----------------------------------------------------
| GetMiss | 648 | 17 | 468 | 33 |
| GetHit | 195 | 16 | 180 | 17 |
| Set | 204 | 14 | 182 | 25 |
| GetSetRand | 164 | 16 | 157 | 20 |
-----------------------------------------------------
Statically linked binaries for go-memcached and go-memcached-bench are available at downloads page.
I have never tried it, but what about redis ?
Its homepage says (quoting) :
Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.
In order to be very fast but at the same time persistent the whole dataset is taken in memory and from time to time and/or when a number of changes to the dataset are performed it is written asynchronously on disk. You may lost the last few queries that is acceptable in many applications but it is as fast as an in memory DB (Redis supports non-blocking master-slave replication in order to solve this problem by redundancy).
It seems to answer some points you talked about, so maybe it might be helpful, in your case?
If you try it, I'm pretty interested in what you find out, btw ;-)
As a side note : if you need to write all this to disk, maybe a cache system is not really what you need... after all, if you are using memcached as a cache, you should be able to re-populate it on-demand, whenever it is necessary -- still, I admit, there might be some performance problems if you whole memcached cluster falls at once...
So, maybe some "more" key/value store oriented software could help? Something like CouchDB, for instance?
It will probably not be as fast as memcached, as data is not store in RAM, but on disk, though...
Maybe your problem is like mine: I have only a few machines for memcached, but with lots of memory. Even if one of them fails or needs to be rebooted, it seriously affects the performance of the system. According to the original memcached philosophy I should add a lot more machines with less memory for each, but that's not cost-efficient and not exactly "green IT" ;)
For our solution, we built an interface layer for the Cache system so that the providers to the underlying cache systems can be nested, like you can do with streams, and wrote a cache provider for memcached as well as our own very simple Key-Value-2-disk storage provider. Then we define a weight for cache items that represent how costly it is to rebuild an item if it cannot be retrieved from cache. The nested Disk cache is only used for items with a weight above a certain threshold, maybe around 10% of all items.
When storing an object in the cache, we won't lose time as saving to one or both caches is queued for asynchronous execution anyway. So writing to the disk cache doesn't need to be fast. Same for reads: First we go for memcached, and only if it's not there and it is a "costly" object, then we check the disk cache (which is by magnitudes slower than memcached, but still so much better then recalculating 30 GB of data after a single machine went down).
This way we get the best from both worlds, without replacing memcached by anything new.