Can I bulk insert into an empty page-compressed table and get full compression?
@HandyD is entirely correct, I only want to highlight some other methods to get compression while inserting into a heap.
From the same document
When a heap is configured for page-level compression, pages receive page-level compression only in the following ways:
According to this, you could leverage minimally logged bulk inserts or use INSERT INTO ... WITH (TABLOCK)
to get PAGE
compression without having to do rebuilds.
(a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?
There are rules to get PAGE
compression when inserting into a heap, add -h "TABLOCK"
to your bcp
command to get compression.
ROW
compression works without these prerequisites and is the least amount of compression used in below examples, thanks @DenisRubashkin for pointing that out!
Testing
Example start data & BCP out command
--Tested on SQL Server 2014 SP2
CREATE TABLE dbo.CompressedHeap_Source( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
INSERT INTO dbo.CompressedHeap_Source
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
--bcp TEST.dbo.CompressedHeap_Source out E:\Data\HeapData.bcp -c -T
The ROW
compressed and Uncompressed size
The data size is at 132272 KB
when doing a standard insert into the heap, this is ROW
compressed but not PAGE
compressed.
The data size without any compression is ~ 176216 KB
for our test.
exec sp_spaceused 'dbo.CompressedHeap_Source'
name rows reserved data index_size unused
CompressedHeap_Source 6365530 132296 KB 132272 KB 8 KB 16 KB
INSERT INTO ... WITH TABLOCK
Inserting WITH TABLOCK
gives us the PAGE
compressed data size, 69480 KB
.
INSERT INTO dbo.CompressedHeap_Source2 WITH(TABLOCK)
(
Val,Datefield,Tinyfield,Floatfield)
SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2
BULK INSERT
Now when we create a destination heap table that is also page
compressed, and do a bulk insert with tablock
:
CREATE TABLE dbo.CompressedHeap_Destination( Val varchar(512),
Datefield Date,
Tinyfield TinyINT,
Floatfield float)
WITH (DATA_COMPRESSION = PAGE);
bulk insert dbo.CompressedHeap_Destination
from 'E:\Data\HeapData.bcp' with (TABLOCK)
The data gets page
compressed and is also at 69480 KB
:
name rows reserved data index_size unused
CompressedHeap_Destination 6365530 69512 KB 69480 KB 8 KB 24 KB
BCP IN WITH TABLOCK
You can get the same results as the BULK INSERT WITH TABLOCK
by using BCP IN
with the -h "TABLOCK"
hint. This makes sense, they do the same internally
--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T -h "TABLOCK"
With the resulting size being 69480 KB
BCP IN WITHOUT TABLOCK
Using BCP to load data from the same file in a copy of the destination table
And a standard bcp command results into non compressed data:
--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T
With the data size at 132272 KB
(row compressed).
According to the Docs article on compression:
New pages allocated in a heap as part of DML operations do not use PAGE compression until the heap is rebuilt. Rebuild the heap by removing and reapplying compression, or by creating and removing a clustered index.
This would seem to align with what you're seeing. It seems like you're not actually getting compression on the table until you rebuild it. You could try loading the data on an uncompressed table and see if you still average 17 rows per page or if this decreases. If it remains the same, then you're not getting compression and the rebuild is necessary.
You could also add a clustered index to your table and that should prevent your table from being uncompressed/low-compressed after bulk loading your data.