SUM of DATALENGTHs not matching table size from sys.allocation_units
Please note that the following info is not intended to be a comprehensive
description of how data pages are laid out, such that one can calculate
the number of bytes used per any set of rows, as that is very complicated.
Data is not the only thing taking up space on an 8k data page:
There is reserved space. You are only allowed to use 8060 of the 8192 bytes (that's 132 bytes that were never yours in the first place):
- Page header: This is exactly 96 bytes.
- Slot array: this is 2 bytes per row and indicates the offset of where each row starts on the page. The size of this array is not limited to the remaining 36 bytes (132 - 96 = 36), else you would be effectively limited to only putting 18 rows max on a data page. This means that each row is 2 bytes larger than you think it is. This value is not included in the "record size" as reported by
DBCC PAGE
, which is why it is kept separate here instead of being included in the per-row info below. - Per-Row meta-data (including, but not limited to):
- The size varies depending on the table definition (i.e. number of columns, variable-length or fixed-length, etc). Info taken from @PaulWhite's and @Aaron's comments that can be found in the discussion related to this answer and testing.
- Row-header: 4 bytes, 2 of them denoting the record type, and the other two being an offset to the NULL Bitmap
- Number of columns: 2 bytes
- NULL Bitmap: which columns are currently
NULL
. 1 byte per each set of 8 columns. And for all columns, even theNOT NULL
ones. Hence, minimum 1 byte. - Variable-length column offset array: 4 bytes minimum. 2 bytes to hold the number of variable-length columns, and then 2 bytes per each variable-length column to hold the offset to where it starts.
- Versioning Info: 14 bytes (this will be present if your database is set to either
ALLOW_SNAPSHOT_ISOLATION ON
orREAD_COMMITTED_SNAPSHOT ON
).
- Please see the following Question and Answer for more details on this: Slot Array and Total Page Size
- Please see the following blog post from Paul Randall which has several interesting details on how the data pages are laid out: Poking about with DBCC PAGE (Part 1 of ?)
LOB pointers for data that is not stored in row. So that would account for
DATALENGTH
+ pointer_size. But these are not of a standard size. Please see the following blog post for details on this complex topic: What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?. Between that linked post and some additional testing I have done, the (default) rules should be as follows:- Legacy / deprecated LOB types that nobody should be using anymore as of SQL Server 2005 (
TEXT
,NTEXT
, andIMAGE
):- By default, always store their data on LOB pages and always use a 16-byte pointer to LOB storage.
- IF sp_tableoption was used to set the
text in row
option, then:- if there is space on the page to store the value, and value is not greater than the max in-row size (configurable range of 24 - 7000 bytes with a default of 256), then it will be stored in-row,
- else it will be a 16-byte pointer.
- For the newer LOB types introduced in SQL Server 2005 (
VARCHAR(MAX)
,NVARCHAR(MAX)
, andVARBINARY(MAX)
):- By default:
- If the value is not greater than 8000 bytes, and there is room on the page, then it will be stored in-row.
- Inline Root — for data between 8001 and 40,000 (really 42,000) bytes, space permitting, there will be 1 to 5 pointers (24 - 72 bytes) IN ROW that point directly to the LOB page(s). 24 bytes for the initial 8k LOB page, and 12 bytes per each additional 8k page for up to four more 8k pages.
- TEXT_TREE — for data over 42,000 bytes, or if the 1 to 5 pointers can't fit in-row, then there will be just a 24 byte pointer to the starting page of a list of pointers to the LOB pages (i.e. the "text_tree" page).
- IF sp_tableoption was used to set the
large value types out of row
option, then always use a 16-byte pointer to LOB storage.
- By default:
- I said "default" rules because I did not test in-row values against the impact of certain features such as Data Compression, Column-level Encryption, Transparent Data Encryption, Always Encrypted, etc.
- Legacy / deprecated LOB types that nobody should be using anymore as of SQL Server 2005 (
LOB overflow pages: If a value is 10k, then that will require 1 full 8k page of overflow, and then part of a 2nd page. If no other data can take up the remaining space (or is even allowed to, I am unsure of that rule), then you have approx 6kb of "wasted" space on that 2nd LOB overflow datapage.
Unused space: An 8k data page is just that: 8192 bytes. It does not vary in size. The data and meta-data placed on it, however, does not always fit nicely into all 8192 bytes. And rows cannot be split onto multiple data pages. So if you have 100 bytes remaining but no row (or no row that would fit in that location, depending on several factors) can fit there, the data page is still taking up 8192 bytes, and your 2nd query is only counting the number of data pages. You can find this value in two places (just keep in mind that some portion of this value is some amount of that reserved space):
DBCC PAGE( db_name, file_id, page_id ) WITH TABLERESULTS;
Look forParentObject
= "PAGE HEADER:" andField
= "m_freeCnt". TheValue
field is the number of unused bytes.SELECT buff.free_space_in_bytes FROM sys.dm_os_buffer_descriptors buff WHERE buff.[database_id] = DB_ID(N'db_name') AND buff.[page_id] = page_id;
This is the same value as reported by "m_freeCnt". This is easier than DBCC since it can get many pages, but also requires that the pages have been read into the buffer pool in the first place.
Space reserved by
FILLFACTOR
< 100. Newly created pages do not respect theFILLFACTOR
setting, but doing a REBUILD will reserve that space on each data page. The idea behind the reserved space is that it will be used by non-sequential inserts and/or updates that expand the size of rows on the page already, due to variable length columns being updated with slightly more data (but not enough to cause a page-split). But you could easily reserve space on data pages that would naturally never get new rows and never have the existing rows updated, or at least not updated in a way that would increase the size of the row.Page-Splits (fragmentation): Needing to add a row to a location that has no room for the row will cause a page split. In this case, approx 50% of the existing data is moved to a new page and the new row is added to one of the 2 pages. But you now have a bit more free space that is not accounted for by
DATALENGTH
calculations.Rows marked for deletion. When you delete rows, they are not always immediately removed from the data page. If they cannot be removed immediately, they are "marked for death" (Steven Segal reference) and will be physically removed later by the ghost cleanup process (I believe that is the name). However, these might not be relevant to this particular Question.
Ghost pages? Not sure if that is the proper term, but sometimes data pages do not get removed until a REBUILD of the Clustered Index is done. That would also account for more pages than
DATALENGTH
would add up to. This generally should not happen, but I have run into it once, several years ago.SPARSE columns: Sparse columns save space (mostly for fixed-length datatypes) in tables where a large % of the rows are
NULL
for one or more columns. TheSPARSE
option makes theNULL
value type up 0 bytes (instead of the normal fixed-length amount, such as 4 bytes for anINT
), but, non-NULL values each take up an additional 4 bytes for fixed-length types and a variable amount for variable-length types. The issue here is thatDATALENGTH
does not include the extra 4 bytes for non-NULL values in a SPARSE column, so those 4 bytes need to be added back in. You can check to see if there are anySPARSE
columns via:SELECT OBJECT_SCHEMA_NAME(sc.[object_id]) AS [SchemaName], OBJECT_NAME(sc.[object_id]) AS [TableName], sc.name AS [ColumnName] FROM sys.columns sc WHERE sc.is_sparse = 1;
And then for each
SPARSE
column, update the original query to use:SUM(DATALENGTH(FieldN) + 4)
Please note that the calculation above to add in a standard 4 bytes is a bit simplistic as it only works for fixed-length types. AND, there is additional meta-data per row (from what I can tell so far) that reduces the space available for data, simply by having at least one SPARSE column. For more details, please see the MSDN page for Use Sparse Columns.
Index and other (e.g. IAM, PFS, GAM, SGAM, etc) pages: these are not "data" pages in terms of user data. These will inflate the total size of the table. If using SQL Server 2012 or newer, you can use the
sys.dm_db_database_page_allocations
Dynamic Management Function (DMF) to see the page types (earlier versions of SQL Server can useDBCC IND(0, N'dbo.table_name', 0);
):SELECT * FROM sys.dm_db_database_page_allocations( DB_ID(), OBJECT_ID(N'dbo.table_name'), 1, NULL, N'DETAILED' ) WHERE page_type = 1; -- DATA_PAGE
Neither the
DBCC IND
norsys.dm_db_database_page_allocations
(with that WHERE clause) will report any Index pages, and only theDBCC IND
will report at least one IAM page.DATA_COMPRESSION: If you have
ROW
orPAGE
Compression enabled on the Clustered Index or Heap, then you can forget about most of what has been mentioned so far. The 96 byte Page Header, 2 bytes-per-row Slot Array, and 14 bytes-per-row Versioning Info are still there, but the physical representation of the data becomes highly complex (much more so than what has already been mentioned when Compression is not being used). For example, with Row Compression, SQL Server attempts to use the smallest possible container to fit each column, per each row. So if you have aBIGINT
column which would otherwise (assumingSPARSE
is also not enabled) always take up 8 bytes, if the value is between -128 and 127 (i.e. signed 8-bit integer) then it will use just 1 byte, and if the value could fit into aSMALLINT
, it will only take up 2 bytes. Integer types that are eitherNULL
or0
take up no space and are simply indicated as beingNULL
or "empty" (i.e.0
) in an array mapping out the columns. And there are many, many other rules. Have Unicode data (NCHAR
,NVARCHAR(1 - 4000)
, but notNVARCHAR(MAX)
, even if stored in-row)? Unicode Compression was added in SQL Server 2008 R2, but there is no way to predict the outcome of the "compressed" value in all situations without doing the actual compression given the complexity of the rules.
So really, your second query, while more accurate in terms of total physical space taken up on disk, is only really accurate upon doing a REBUILD
of the Clustered Index. And after that, you still need to account for any FILLFACTOR
setting below 100. And even then there are always page headers, and often enough some amount of "wasted" space that is simply not fillable due to being too small to fit any row in this table, or at least the row that logically should go in that slot.
Regarding the accuracy of the 2nd query in determining "data usage", it seems most fair to back-out the Page Header bytes since they are not data usage: they are cost-of-business overhead. If there is 1 row on a data page and that row is just a TINYINT
, then that 1 byte still required that the data page existed and hence the 96 bytes of the header. Should that 1 department get charged for the entire data page? If that data page is then filled up by Department #2, would they evenly split that "overhead" cost or pay proportionally? Seems easiest to just back it out. In which case, using a value of 8
to multiply against number of pages
is too high. How about:
-- 8192 byte data page - 96 byte header = 8096 (approx) usable bytes.
SELECT 8060.0 / 1024 -- 7.906250
Hence, use something like:
(SUM(a.total_pages) * 7.91) / 1024 AS [TotalSpaceMB]
for all calculations against "number_of_pages" columns.
AND, considering that using DATALENGTH
per each field cannot return the per-row meta-data, that should be added to your per-table query where you get the DATALENGTH
per each field, filtering on each "department":
- Record Type and offset to NULL Bitmap: 4 bytes
- Column Count: 2 bytes
- Slot Array: 2 bytes (not included in "record size" but still need to account for)
- NULL Bitmap: 1 byte per every 8 columns (for all columns)
- Row Versioning: 14 bytes (if database has either
ALLOW_SNAPSHOT_ISOLATION
orREAD_COMMITTED_SNAPSHOT
set toON
) - Variable-length column Offset Array: 0 bytes if all columns are fixed-length. If any columns are variable-length, then 2 bytes, plus 2 bytes per each of only the variable-length columns.
- LOB pointers: this part is very imprecise since there won't be a pointer if the value is
NULL
, and if the value fits on the row then it can be much smaller or much larger than the pointer, and if the value is stored off-row, then the size of the pointer might depend on how much data there is. However, since we just want an estimate (i.e. "swag"), it seems like 24 bytes is a good value to use (well, as good as any other ;-). This is per-eachMAX
field.
Hence, use something like:
In general (row header + number of columns + slot array + NULL bitmap):
([RowCount] * (( 4 + 2 + 2 + (1 + (({NumColumns} - 1) / 8) ))
In general (auto-detect if "version info" is present):
+ (SELECT CASE WHEN snapshot_isolation_state = 1 OR is_read_committed_snapshot_on = 1 THEN 14 ELSE 0 END FROM sys.databases WHERE [database_id] = DB_ID())
IF there are any variable-length columns, then add:
+ 2 + (2 * {NumVariableLengthColumns})
IF there are any
MAX
/ LOB columns, then add:+ (24 * {NumLobColumns})
In general:
)) AS [MetaDataBytes]
This isn't exact, and again will not work if you have Row or Page Compression enabled on the Heap or Clustered Index, but should definitely get you closer.
UPDATE Regarding the 15% Difference Mystery
We (myself included) were so focused on thinking about how data pages are laid out and how DATALENGTH
might account for things that we didn't spend a lot of time reviewing the 2nd query. I ran that query against a single table and then compared those values to what was being reported by sys.dm_db_database_page_allocations
and they were not the same values for the number of pages. On a hunch, I removed the aggregate functions and GROUP BY
, and replaced the SELECT
list with a.*, '---' AS [---], p.*
. And then it became clear: people must be careful where on these murky interwebs they get their info and scripts from ;-). The 2nd query posted in the Question is not exactly correct, especially for this particular Question.
Minor problem: outside of it not making much sense to
GROUP BY rows
(and not have that column in an aggregate function), the JOIN betweensys.allocation_units
andsys.partitions
isn't technically correct. There are 3 types of Allocation Units, and one of them should JOIN to a different field. Quite oftenpartition_id
andhobt_id
are the same, so there might never be a problem, but sometimes those two fields do have different values.Major problem: the query uses the
used_pages
field. That field covers all types of pages: Data, Index, IAM, etc, tc. There is another, more appropriate field to use when concerned with only the actual data:data_pages
.
I adapted the 2nd query in the Question with the above items in mind, and using the data page size that backs out the page header. I also removed two JOINs that were unnecessary: sys.schemas
(replaced with call to SCHEMA_NAME()
), and sys.indexes
(the Clustered Index is always index_id = 1
and we have index_id
in sys.partitions
).
SELECT SCHEMA_NAME(st.[schema_id]) AS [SchemaName],
st.[name] AS [TableName],
SUM(sp.[rows]) AS [RowCount],
(SUM(sau.[total_pages]) * 8.0) / 1024 AS [TotalSpaceMB],
(SUM(CASE sau.[type]
WHEN 1 THEN sau.[data_pages]
ELSE (sau.[used_pages] - 1) -- back out the IAM page
END) * 7.91) / 1024 AS [TotalActualDataMB]
FROM sys.tables st
INNER JOIN sys.partitions sp
ON sp.[object_id] = st.[object_id]
INNER JOIN sys.allocation_units sau
ON ( sau.[type] = 1
AND sau.[container_id] = sp.[partition_id]) -- IN_ROW_DATA
OR ( sau.[type] = 2
AND sau.[container_id] = sp.[hobt_id]) -- LOB_DATA
OR ( sau.[type] = 3
AND sau.[container_id] = sp.[partition_id]) -- ROW_OVERFLOW_DATA
WHERE st.is_ms_shipped = 0
--AND sp.[object_id] = OBJECT_ID(N'dbo.table_name')
AND sp.[index_id] < 2 -- 1 = Clustered Index; 0 = Heap
GROUP BY SCHEMA_NAME(st.[schema_id]), st.[name]
ORDER BY [TotalSpaceMB] DESC;
Maybe this is a grunge answer but this is what I would do.
So DATALENGTH only account for 86% of the total. It is still very representative split. The overhead in the excellent answer from srutzky should have a pretty even split.
I would use your second query (pages) for the total. And use the first (datalength) for allocating the split. Many costs are allocated using a normalization.
And you have to consider a closer answer is going to raise cost so even the dept that lost out on a split may still pay more.