Fastest way to insert 100,000+ records into DocumentDB
The fastest way to insert documents into Azure DocumentDB. is available as a sample on Github: https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark
The following tips will help you achieve the best througphput using the .NET SDK:
- Initialize a singleton DocumentClient
- Use Direct connectivity and TCP protocol (
ConnectionMode.Direct
andConnectionProtocol.Tcp
) - Use 100s of Tasks in parallel (depends on your hardware)
- Increase the
MaxConnectionLimit
in the DocumentClient constructor to a high value, say 1000 connections - Turn
gcServer
on - Make sure your collection has the appropriate provisioned throughput (and a good partition key)
- Running in the same Azure region will also help
With 10,000 RU/s, you can insert 100,000 documents in about 50 seconds (approximately 5 request units per write).
With 100,000 RU/s, you can insert in about 5 seconds. You can make this as fast as you want to, by configuring throughput (and for very high # of inserts, spread inserts across multiple VMs/workers)
EDIT: You can now use the bulk executor library at https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview, 7/12/19
The Cosmos Db team have just released a bulk import and update SDK, unfortunately only available in Framework 4.5.1 but this apparently does a lot of the heavy lifting for you and maximize use of throughput. see
https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview https://docs.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-bulk-executor-dot-net
Cosmos DB SDK has been updated to allow bulk insert: https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-sql-api-dotnet-bulk-import via the AllowBulkExecution option.