Data Warehouse Considerations: When and Why?

  1. The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.

  2. For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).

  3. The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.

  4. If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.

  5. All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.


I'll see if I can do my best to answer your questions succinctly.

1.At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?

a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.

b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.

c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.

d. If you want to bring together data from multiple sources.

2.What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?

3.Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?

I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."

Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.

4.When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?

Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.

If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.

5.Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.

That's a big question that would take far more space than I'm allotted here.

On this one, I can point you to a few places that might provide the insight you seek.

  • "Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
  • "Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
  • "The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.