Storing images in NoSQL stores

Well CDN would be the obvious choice. Since that's out, I'd say your best bet for fault tolerance and load balancing would be your own private data center (whatever that means to you) behind 2 or more load balancers like an F5. This will be your easiest management system and you can get as much fault tolerance as your hardware budget allows. You won't need any new software expertise, just XCOPY.

For true fault tolerance you're going to need geographic dispersion or you're subject to anyone with a backhoe.

(Gravatars?)


Whether or not to store images in a DB or the filesystem is sometime one of those "holy war" type of debates; each side feels their way of doing things is the one right way. In general:

To store in the DB:

  • Easier to manage back-up/replicate everything at once in one place.
  • Helps with your data consistency and integrity. You can set the BLOB field to disallow NULLs, but you're not going to be able to prevent an external file from being deleted. (Though this isn't applicable to NoSQL since there aren't the traditional constraints).

To store on the filesystem:

  • A filesystem is designed to serve files. Let it do it's job.
  • The DB is often your bottleneck in an application. Whatever load you can take off it, the better.
  • Easier to serve on a CDN (which you mentioned isn't applicable in your situation).

I tend to come down on the side of the filesystem because it scales much better. But depending on the size of your project, either choice will likely work fine. With NoSQL, the differences are even less apparent.


Mongo DB should work well for you. I haven't used it for blobs yet, but here is a nice FLOSS Weekly podcast interview with Michael Dirolf from the Mongo DB team where he addresses this use case.


I was looking for a similar solution for a personal project and came across Riak, which, to me, seems like an amazing solution to this problem. Basically, it distributes a specified number of copies of each file to the servers in the network. It is designed such that a server coming or going is no big deal. All the copies on a server that leaves are distributed amongst the others.

With the right configuration, Riak can deal with an entire datacenter crashing.

Oh, and it has commercial support available.

Tags:

Image

Nosql