Fastest Way of Inserting in Entity Framework
To your remark in the comments to your question:
"...SavingChanges (for each record)..."
That's the worst thing you can do! Calling SaveChanges()
for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:
- Call
SaveChanges()
once after ALL records. - Call
SaveChanges()
after for example 100 records. - Call
SaveChanges()
after for example 100 records and dispose the context and create a new one. - Disable change detection
For bulk inserts I am working and experimenting with a pattern like this:
using (TransactionScope scope = new TransactionScope())
{
MyDbContext context = null;
try
{
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
{
++count;
context = AddToContext(context, entityToInsert, count, 100, true);
}
context.SaveChanges();
}
finally
{
if (context != null)
context.Dispose();
}
scope.Complete();
}
private MyDbContext AddToContext(MyDbContext context,
Entity entity, int count, int commitCount, bool recreateContext)
{
context.Set<Entity>().Add(entity);
if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
}
}
return context;
}
I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.
For the performance it is important to call SaveChanges()
after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges
doesn't do that, the entities are still attached to the context in state Unchanged
. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.
Here are a few measurements for my 560000 entities:
- commitCount = 1, recreateContext = false: many hours (That's your current procedure)
- commitCount = 100, recreateContext = false: more than 20 minutes
- commitCount = 1000, recreateContext = false: 242 sec
- commitCount = 10000, recreateContext = false: 202 sec
- commitCount = 100000, recreateContext = false: 199 sec
- commitCount = 1000000, recreateContext = false: out of memory exception
- commitCount = 1, recreateContext = true: more than 10 minutes
- commitCount = 10, recreateContext = true: 241 sec
- commitCount = 100, recreateContext = true: 164 sec
- commitCount = 1000, recreateContext = true: 191 sec
The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.
The fastest way would be using bulk insert extension, which I developed
note: this is a commercial product, not free of charge
It uses SqlBulkCopy and custom datareader to get max performance. As a result it is over 20 times faster than using regular insert or AddRange
usage is extremely simple
context.BulkInsert(hugeAmountOfEntities);
You should look at using the System.Data.SqlClient.SqlBulkCopy
for this. Here's the documentation, and of course there are plenty of tutorials online.
Sorry, I know you were looking for a simple answer to get EF to do what you want, but bulk operations are not really what ORMs are meant for.
This combination increase speed well enough.
context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;