Is a 5+ column primary key bad for large (100 million+) table?

There are performance issues with very complex primary keys. And It may not be defending against duplication as well as a simpler primary key might.

However, there is one design pattern that frequently yields tables with a primary key made up of six or so components. It's star schema fact tables. If the fact table of a star schema has six dimensions, then the primary key will have six components. I've never seen a fact table with no declared primary key, and I think it's well worth the overhead, even though the ETL process still has to be quite carefully written.

Some reporting databases imitate the pattern of star schema even if it's not explicitly designed that way.

100 million + rows is not overly big for a fact table, especially with today's big data.


The table in question was a rollup/aggregation table.

Then it is not only fine, it is "right".

And it smells like a Summary table, since it begins with day.

Do you have some secondary indexes? Keep in mind that if you are using InnoDB, the rest of the PRIMARY KEY columns will be tacked onto the end of the secondary index. Again, this is not necessarily a problem.

100M rows is a lot for a rollup. It sounds like the table is too fine-grained. That is, perhaps instead if (date,a,b,c,d) you should have 4 rollups with PKs like (date,a,b,c), (date,b,c,d), (date,c,d,a), (date,d,a,b) (or some suitable combinations). I doing that, each might be only 10M rows, thereby making reports still faster, while having nearly as much flexibility in report.

Or maybe switch to (week,a,b,c,d), leading to maybe only 14M rows. (Probably more.)

Using PARTITION to facilitate pruning --- High speed ingestion --- Data Warehouse tips --- Summary Tables. These summarize many of the techniques I have developed in several DW projects. As you may infer, each project is different. The 'typical' number of Summary Tables (in my experience) is 3-7. The target in summarization is 10 Fact rows --> 1 Summary row. (That may be a 'median'.) In a rare case, I summarized a Summary table. In another rare case, I PARTITIONed a Summary table to good effect; usually Summary tables are small enough so they are fast enough for direct access from a UI.