Should I normalize my DB or not?

A philosophical answer: Sub-optimal (relational) databases are rife with insert, update, and delete anomalies. These all lead to inconsistent data, resulting in poor data quality. If you can't trust the accuracy of your data, what good is it? Ask yourself this: Do you want the right answers slower or do you want the wrong answers faster?

As a practical matter: get it right before you get it fast. We humans are very bad at predicting where bottlenecks will occur. Make the database great, measure the performance over a decent period of time, then decide if you need to make it faster. Before you denormalize and sacrifice accuracy try other techniques: can you get a faster server, connection, db driver, etc? Might stored procedures speed things up? How are the indexes and their fill factors? If those and other performance and tuning techniques do not do the trick, only then consider denormalization. Then measure the performance to verify that you got the increase in speed that you "paid for". Make sure that you are performing optimization, not pessimization.

[edit]

Q: So if I optimize last, can you recommend a reasonable way to migrate data after the schema is changed? If, for example, I decide to get rid of a lookup table - how can I migrate existing databased to this new design?

A: Sure.

  1. Make a backup.
  2. Make another backup to a different device.
  3. Create new tables with "select into newtable from oldtable..." type commands. You'll need to do some joins to combine previously distinct tables.
  4. Drop the old tables.
  5. Rename the new tables.

BUT... consider a more robust approach:

Create some views on your fully normalized tables right now. Those views (virtual tables, "windows" on the data... ask me if you want to know more about this topic) would have the same defining query as step three above. When you write your application or DB-layer logic, use the views (at least for read access; updatable views are... well, interestsing). Then if you denormalize later, create a new table as above, drop the view, rename the new base table whatever the view was. Your application/DB-layer won't know the difference.

There's actually more to this in practice, but this should get you started.


A normal design is the place to start; get it right, first, because you may not need to make it fast.

The concern about time-costly joins are often based on experience with poor designs. As the design becomes more normal, the number of tables in the design usually increases while the number of columns and rows in each table decreases, the number of unions in the design increase as the number of joins decreases, indicies become more useful, &c. In other words: good things happen.

And normalization is only one way to end up with a normal design...


The usage pattern of your database (insert-heavy vs. reporting-heavy) will definitely affect your normalization. Furthermore, you may want to look at your indexing, etc. if you are seeing a significant slowdown with normalized tables. Which version of MySQL are you using?

In general, an insert-heavy database should be more normalized than a reporting-heavy database. However, YMMV of course...