What is the recommended batch size for SqlBulkCopy?
This is an issue I have also spent some time looking into. I am looking to optimize importing large CSV files (16+ GB, 65+ million records, and growing) into a SQL Server 2005 database using a C# console application (.Net 2.0). As Jeremy has already pointed out, you will need to do some fine-tuning for your particular circumstances, but I would recommend you have an initial batch size of 500, and test values both above and below this.
I got the recommendation to test values between 100 and 1000 for batch size from this MSDN forum post, and was skeptical. But when I tested for batch sizes between 100 and 10,000, I found that 500 was the optimal value for my application. The 500 value for SqlBulkCopy.BatchSize
is also recommended here.
To further optimize your SqlBulkCopy operation, check out this MSDN advice; I find that using SqlBulkCopyOptions.TableLock helps to reduce loading time.
As others have stated, it depends on your environment specifically the row volume and network latency.
Personally, I'd start with setting the BatchSize
property to 1000 rows and see how that performs. If it works, then I keep doubling the number of rows (e.g. to 2000, 4000, etc.) until I get a timeout.
Otherwise, if a timeout occurs at 1000, then I decrease the number of rows by half (e.g. 500) until it works.
In each case, I keep doubling (if successful) or halving (if failed) the difference between each of the last two attempted batch sizes until finding a sweet spot.
The other factor to consider is how long does it take to copy a single batch of rows. Timeouts will occur if the batch of rows being copied exceeds the BulkCopyTimeout
property which by default is 30 seconds. You might try doubling the BulkCopyTimeout
property to 60 seconds. This allows a longer period of time for a larger set of batch rows to be copied. For example, a batch of 50,000 rows might take around 40 seconds just exceeding the 30 seconds time limit so bumping it up to 60 seconds might help with the performance.
I have an import utility sitting on the same physical server as my SQL Server instance. Using a custom IDataReader
, it parses flat files and inserts them into a database using SQLBulkCopy
. A typical file has about 6M qualified rows, averaging 5 columns of decimal and short text, about 30 bytes per row.
Given this scenario, I found a batch size of 5,000 to be the best compromise of speed and memory consumption. I started with 500 and experimented with larger. I found 5000 to be 2.5x faster, on average, than 500. Inserting the 6 million rows takes about 30 seconds with a batch size of 5,000 and about 80 seconds with batch size of 500.
10,000 was not measurably faster. Moving up to 50,000 improved the speed by a few percentage points but it's not worth the increased load on the server. Above 50,000 showed no improvements in speed.
This isn't a formula, but it's another data point for you to use.