Entity Framework Include performance
For everybody coming here, I want you to know following 2 things:
.Select(x => x.NavProp).Load() does not actually load the navigation property if you have turned off tracking.
Since version 3.0.0, each Include will cause an additional JOIN to be added to SQL queries produced by relational providers, whereas previous versions generated additional SQL queries. This can significantly change the performance of your queries, for better or worse. In particular, LINQ queries with an exceedingly high number of Include operators may need to be broken down into multiple separate LINQ queries in order to avoid the cartesian explosion problem.
Source for both statements: https://docs.microsoft.com/en-us/ef/core/querying/related-data
So it is not true that EF Core does Select and SelectMany in the backgruond. In my case we had a single entity with loads of navigation properties, and with Include it actually loaded over 15,000 rows (yes that's correct and what I would call a Cartesian Explosion problem). After I refactored the code to work with Select / SelectMany, that row count was reduced to 118. Query time reduced from 4s to below a second, even though we have exactly 20 includes)
Hope this helps somebody, and BIG thanks to Ivan.
Your second approach relies on the EF navigation property fixup process. The problem is though that every
query.Include(q => q.ItemNavN).Load();
statement will also include all the master record data along with the related entity data.
Using the same basic idea, one potential improvement could be to execute one Load
per each navigation property, replacing the Include
with either Select
(for references) or SelectMany
(for collections) - something similar to how EF Core processes the Include
s internally.
Taking your second approach example, you could try the following and compare the performance:
var query = ctx.Filters.Where(x => x.SessionId == id)
.Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);
query.Select(x => x.ItemNav1).Load();
query.Select(x => x.ItemNav2).Load();
query.Select(x => x.ItemNav3).Load();
query.Select(x => x.ItemNav4).Load();
query.Select(x => x.ItemNav5).Load();
query.Select(x => x.ItemNav6).Load();
var result = query.ToList();
// here all the navigation properties should be populated
There are many ways to increase performance.
I'll put here some and you can try each one to see who gives you the best results.
You can use the System.Diagnostics.StopWatch to get the elapsed execution time.
1. Indexes are missing (e.g. on foreign keys)
2. Write your query in a view in the database, it is much cheeper. You can also create indexed view for this query.
3. Try to load data in separate queries:
context.Configuration.LazyLoadingEnabled = false;
context.ContactTypes.Where(c => c.ContactID== contactId).Load();
context.ContactConnections.Where(c => c.ContactID== contactId).Load();
return context.Contacts.Find(contactId);
This loads all required data into the context's cache. Important: turn off lazy loading because the child collections are not marked as loaded in the entity state manager and EF will try to trigger lazy loading when you want to access them.
4. Replacing the Include with Select().Load():
var query = ctx.Users.Where(u => u.UserID== userId)
.Join(ctx.Persons, p => p.PersonID, us => us.PersonID, (pr, ur) => ur);
query.Select(x => x.PersonIdentities).Load();
query.Select(x => x.PersonDetails).Load();
var result = query.ToList();
Remember: turn on tracking for to load the navigation property.
5. Separate includes, to multiple calls, limit to 2 includes in each call, and then loop to connect object properties.
Here is an example for single object fetch:
var contact= from c in db.Contacts
.Include(p=>p.ContactTypes)
.Include(p=>p.ContactConnections)
.FirstOrDefault();
var contact2= from c in db.Contacts
.Include(p=>p.ContactIdentities)
.Include(p=>p.Person)
.FirstOrDefault();
contact.ContactIdentities = contact2.ContactIdentities ;
contact.Person= contact2.Person;
return contact.