When to use MongoDB or other document oriented database systems?
After two years using MongoDb for a social app, I have witnessed what it really means to live without a SQL RDBMS.
- You end up writing jobs to do things like joining data from different tables/collections, something that an RDBMS would do for you automatically.
- Your query capabilities with NoSQL are drastically crippled. MongoDb may be the closest thing to SQL but it is still extremely far behind. Trust me. SQL queries are super intuitive, flexible and powerful. MongoDb queries are not.
- MongoDb queries can retrieve data from only one collection and take advantage of only one index. And MongoDb is probably one of the most flexible NoSQL databases. In many scenarios, this means more round-trips to the server to find related records. And then you start de-normalizing data - which means background jobs.
- The fact that it is not a relational database means that you won't have (thought by some to be bad performing) foreign key constrains to ensure that your data is consistent. I assure you this is eventually going to create data inconsistencies in your database. Be prepared. Most likely you will start writing processes or checks to keep your database consistent, which will probably not perform better than letting the RDBMS do it for you.
- Forget about mature frameworks like hibernate.
I believe that 98% of all projects probably are way better with a typical SQL RDBMS than with NoSQL.
Note that Mongo essentially stores JSON. If your app is dealing with a lot of JS Objects (with nesting) and you want to persist these objects then there is a very strong argument for using Mongo. It makes your DAL and MVC layers ultra thin, because they are not un-packaging all the JS object properties and trying to force-fit them into a structure (schema) that they don't naturally fit into.
We have a system that has several complex JS Objects at its heart, and we love Mongo because we can persist everything really, really easily. Our objects are also rather amorphous and unstructured, and Mongo soaks up that complication without blinking. We have a custom reporting layer that deciphers the amorphous data for human consumption, and that wasn't that difficult to develop.
to store this unstructured data
As you said, MongoDB is best suitable to store unstructured data. And this can organize your data into document format. These RDBMS altenatives called NoSQL data stores (MongoDB, CouchDB, Voldemort) are very useful for applications that scales massively and require faster data access from these big data stores.
And the implementation of these databases are simpler than the regular RDBMS. Since these are simple key-valued or document style binary objects directly serialized into disk. These data stores don't enforce the ACID properties, and any schemas. This doesn't provide any transaction abilities. So this can scale big and we can achieve faster access (both read and write).
But in contrast, RDBM enforces ACID and schemas on datas. If you wanted to work with structured data you can go ahead with RDBM.
I would choose MySQL for creating forums for this kind of stuff. Because this is not going to scale big. And this is a very simple (common) application which has structured relations among the data.
In NoSQL: If Only It Was That Easy, the author writes about MongoDB:
MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you.
Then, in the conclusion:
The real thing to point out is that if you are being held back from making something super awesome because you can’t choose a database, you are doing it wrong. If you know mysql, just use it. Optimize when you actually need to. Use it like a k/v store, use it like a rdbms, but for god sake, build your killer app! None of this will matter to most apps. Facebook still uses MySQL, a lot. Wikipedia uses MySQL, a lot. FriendFeed uses MySQL, a lot. NoSQL is a great tool, but it’s certainly not going to be your competitive edge, it’s not going to make your app hot, and most of all, your users won’t care about any of this.
What am I going to build my next app on? Probably Postgres. Will I use NoSQL? Maybe. I might also use Hadoop and Hive. I might keep everything in flat files. Maybe I’ll start hacking on Maglev. I’ll use whatever is best for the job. If I need reporting, I won’t be using any NoSQL. If I need caching, I’ll probably use Tokyo Tyrant. If I need ACIDity, I won’t use NoSQL. If I need a ton of counters, I’ll use Redis. If I need transactions, I’ll use Postgres. If I have a ton of a single type of documents, I’ll probably use Mongo. If I need to write 1 billion objects a day, I’d probably use Voldemort. If I need full text search, I’d probably use Solr. If I need full text search of volatile data, I’d probably use Sphinx.
I like this article, I find it very informative, it gives a good overview of the NoSQL landscape and hype. But, and that's the most important part, it really helps to ask yourself the right questions when it comes to choose between RDBMS and NoSQL. Worth the read IMHO.
Alternate link to article