When to build a separate reporting database?

In general, the more mission critical the transactional app and the more sophisticated the reporting requirements, the more splitting makes sense.

  1. When transaction performance is critical.
  2. When it's hard to get a maintenance window on the transactional app.
  3. If reporting needs to correlate results not only from this app, but from other application silos.
  4. If the reports need to support trending or other types of reporting that are best suited for a star schema/Business Intelligence environment.
  5. If the reports are long running.
  6. If the transactional app is on an expensive hardware resource (cluster, mainframe, etc.)
  7. If you need to do data cleansing/extract-transform-load operations on the transactional data (e.g., state names to canonical state abbreviations).

It adds non-trivial complexity, so imo, there has to be a good reason to split.


Typically, I would try to report off the transactional database initially.

Ensure that any indexes you add to facilitate efficient reporting are all frequently used. The more indexes you add, the poorer performance is going to be on inserts and (if you alter keys) updates.

When you do go to a reporting database, remember there are only a few reasons you are going there:

Ultimately, the number one thing about reporting databases is that you are removing locking contention from the OLTP database. So if your reporting database is a straight copy of the same database, you're simply using delayed snapshots which won't interfere with production transactions.

Next, you can have a separate indexing strategy to support the reporting usage scenarios. These extra indexes are OK to maintain in the reporting database, but would cause unnecessary overhead in the OLTP database.

Now both the above could be done on the same server (even the same instance in a separate database or even just in a separate schema) and still see benefits. When CPU and IO are completely pegged, at that point, you definitely need to have it on a completely separate box (or upgrade your single box).

Finally, for ultimate reporting flexibility, you denormalize the data (usually into a dimensional model or star schemas) so that the reporting database is the same data in a different model. Reporting of large amounts of data (particularly aggregates) is extremely fast in dimensional models because the star schemas are very efficient for that. It also is efficient for a larger variety of queries without a lot of re-indexing or analysis to change indexes, because the dimensional model lends itself better to unforeseen usage patterns (the old "slice and dice every which way" request). You could view this is a kind of mini-data warehouse where you use data warehousing techniques, but aren't necessarily implementing a full-blown data warehouse. Also, star schemas are particular easy for users to get to grips with, and data dictionaries are much simpler and easier to build for BI tools or reporting tools from star schemas. You could do this on the same box or different box etc, just like discussed earlier.