Need Advice: Is this a good use case for a 'NoSQL' Database? If so, which one?
The simple answer is that there is no simple answer to these sort of problems, the only way to find out what works for your scenario is to invest R&D time into it.
The question is hard to answer because the performance requirements aren't spelled out by the OP. It appears to be 75M/year records over a number of customers with a write rate of num_customers*1minute (which is low), but I don't have figures for the required read / query performance.
Effectively you have already a sharded database using horizontal partitioning because you're storing each customer in a seperate table. This is good and will increase performance. However you haven't yet established that you have a performance problem, so this needs to be measured and the problem size assessed before you can fix it.
A NoSQL database is indeed a good way of fixing performance problems with traditional RDBMS, but it will not provide automatic scalabity and is not a general solution. You need to find your performance problem fix and then design the (nosqL) data model to provide the solution.
Depending on what you're trying to achieve I'd look at MongoDB, Apache Cassandra, Apache HBase or Hibari.
Remember that NoSQL is a vague term typically encompassing
- Applications that are either performance intensive in read or write. Often sacrificing read or write performance at the expense of the other.
- Distribution and scalability
- Different methods of persistency (RAM/Disk)
- A more structured/defined access pattern making ad-hoc queries harder.
So, in the first instance I'd see if a traditional RDBMS can achieve the required performance, using all available techniques, get a copy of High Performance MySQL and read MySQL Performance Blog.
Rev1:
In light of your comments I think it is fair to say that you could achieve what you want with one of the above NOSQL engines.
My primary recommendation would be to get your data model designed and implemented, what you're using at the moment isn't really right.
So look at Entity-attribute-value model as I think it is exactly right for what you need.
You need to get your data model right before you can consider which technology to use, being honest modifying schemas dynamically isn't a datamodel.
I'd use a traditional SQL database to validate and test the new datamodel as the management tools are better and it's generally easier to work with the schemas as you refine the datamodel.
Ok, I might get flamed for not answering your question directly but I'm going to say it anyway because I think it's something you should consider. I don't have experience with NOSQL databases so I can't recommend one but as far as relational databases go there might be a better design for your situation.
First of all - drop the 1 table per customer. Instead, I would architect a many to many schema in which there would be the following tables:
- Customers
- MeasurementTypes
- Measurements
The Customers table will contain customer information, and a unique CustomerID field:
CustomerID | CustomerName | ..and other fields
---------------------------------------------------------------------
The MeasurementTypes table would describe each type of measurement that you support, and assign a unique name (the MeasurementType field) to refer to it:
MeasurementType | Description | ..and other pertinent fields
---------------------------------------------------------------------
The Measurements table is where all the data is aggregated. You would have one record for each data point collected, stamped with the customer id, the measurement type, a time stamp, and a unique "batch" identifier (to be able to group data points from each measurement together) - and of course the measurement value. If you need different types of values for your measurements you may need to get a little creative with the design but most likely the measurement values can all be represented by a single data type.
Customer | MeasurementBatch | MeasurementType | Timestamp | Value |
--------------------------------------------------------------------------------
1 | {GUID} | 'WIND_SPEED' | ... | ...
--------------------------------------------------------------------------------
| | | | |
This way, you can have a very flexible design that would allow you to add as many data points for each customer independently from other customers. And you get the benefits of relational databases..
If your SQL engine supports this feature you could even partition the Measurements table by the customer column.
Hope this helps..
EDIT
I must mention that I'm not in any way affiliated with Microsoft nor am I trying to give them free advertising - it just so happens I'm most familiar with their SQL server.
Based on Alan's comment - regarding whether a SQL database can support a data volume of a few thousand million records per year with the possibility of growing up to a billion records per year - there is a nice summary of limitations/specs for MS SQL server available here:
http://msdn.microsoft.com/en-us/library/ms143432.aspx
It seems that the only limitation to how many records you can have per table is the available size on disk (and probably RAM if you're going to want to run certain reports on that data).