How to create a multi-tenant database with shared table structures?
My experience (albeit SQL Server) is that multi-database is the way to go, where each client has their own database. So although I have no mySQL or Ruby On Rails experience, I'm hoping my input might add some value.
The reasons why include :
- data security/disaster recovery. Each companies data is stored entirely separately from others giving reduced risk of data being compromised (thinking things like if you introduce a code bug that means something mistakenly looks at other client data when it shouldn't), minimizes potential loss to one client if one particular database gets corrupted etc. The perceived security benefits to the client are even greater (added bonus side effect!)
- scalability. Essentially you'd be partitioning your data out to enable greater scalability - e.g. databases can be put on to different disks, you could bring multiple database servers online and move databases around easier to spread the load.
- performance tuning. Suppose you have one very large client and one very small. Usage patterns, data volumes etc. can vary wildly. You can tune/optimise easier for each client should you need to.
I hope this does offer some useful input! There are more reasons, but my mind went blank. If it kicks back in, I'll update :)
EDIT:
Since I posted this answer, it's now clear that we're talking 10,000+ tenants. My experience is in hundreds of large scale databases - I don't think 10,000 separate databases is going to be too manageable for your scenario, so I'm now not favouring the multi-db approach for your scenario. Especially as it's now clear you're talking small data volumes for each tenant!
Keeping my answer here as anyway as it may have some use for other people in a similar boat (with fewer tenants)
However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.
This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.
There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:
A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.
As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:
The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.
How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)
How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.
Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
UPDATE: Further to update about the expected number of tenants.
That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.
From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.
The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:
- Trusted Database Connections
- Tenant View Filter
- Tenant Data Encryption
When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrement that provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.
UPDATE 2: Apparently the Microsoft guys moved / made a new article regarding this subject, the original link is gone and this is the new one: Multi-tenant SaaS database tenancy patterns (kudos to Shai Kerer)