Why might a table's data space take up 4x the size of the raw data?

After discussions in the comments on the original question, it appears in this case the lost space is caused by the choice of clustered key, which has led to massive fragmentation.

Always worth checking the state of fragmentation via sys.dm_db_index_physical_stats in these situations.

Edit: Following update in comments

The average page density (prior to rebuild of the clustered index) was 24%, which fits perfectly with the original question. The pages were only 1/4 full, so the total size was 4x the raw data size.


The on-disk structures have overhead:

  • row header
  • null bitmap + pointer
  • variable length column offsets
  • row version pointers (optional)
  • ...

Taking 2 x 4 bytes int columns, you have

  • 4 bytes row header
  • 2 byte pointer to NULL bitmap
  • 8 bytes for 2 int columns
  • 3 bytes NULL bitmap

Wow 17 bytes!

You can the same for your second test table which has more overhead like your original one:

  • 2 bytes for the count of variable-length columns
  • 2 bytes per variable length column

Why the difference? In addition (I won't link to these)

  • have you ever rebuilt indexes to defragment them?
  • deletes do not reclaim space
  • data pages will split if you insert into the middle
  • updates may cause forward pointers (leaves a gap)
  • row overflow
  • removed varchar column without index rebuild or DBCC CLEANTABLE
  • heap or table (heap has no clustered index = records scattered all over)
  • RCSI isolation level (extra 14 bytes per row)
  • trailing spaces (SET ANSI_PADDING is ON by default) in varchar. Use DATALENGTH to checl, not LEN
  • Run sp_spaceused with @updateusage = 'true'
  • ...

See this: SQL Server: How to create a table that fills one 8 KB page?

From SO:

  • https://stackoverflow.com/questions/4614771/sql-server-2008-how-much-space-does-this-occupy/4614799#4614799
  • https://stackoverflow.com/questions/3793022/how-to-come-to-limits-of-8060-bytes-per-row-and-8000-per-varchar-nvarchar-valu/3793265#3793265

Have the data types changed over time? Have variable-length columns been removed? Have the indexes been defragmented often but never rebuilt? Have a lot of rows been deleted or have a lot of variable-length columns been updated significantly?