Reading thousands of objects with EF Core FAST

Most answers follow the common wisdom of loading less data, but in some circumstances such as here you Absolutely Positively Must load a lot of entities. So how do we do that?

Cause of poor performance

Is it unavoidable for this operation to take this long? Well, its not. We are loading just a megabyte of data from disk, the cause of poor performance is that the data is split across 40,000 tiny entities. The database can handle that, but the entity framework seem to struggle setting up all those entities, change tracking, etc. If we do not intend to modify the data, there is a lot we can do.

I tried three things

Primitives

Load just one property, then you get a list of primitives.

List<double> dataPoints =  _db.DataPoints.Select(dp => dp.Value).ToList();

This bypasses all of entity creation normally performed by entity framework. This query took 0.4 seconds, compared to 18 seconds for the original query. We are talking 45 (!) times improvement.

Anonymous types

Of-course most of the time we need more than just an array of primitives We can create new objects right inside the LINQ query. Entity framework won't create the entities it normally would, and the operation runs much faster. We can use anonymous objects for convenience.

var query = db.DataPoints.Select(dp => new {Guid ID = dp.sensorID, DateTimeOffset Timestamp = dp.TimeStamp, double Value = dp.Value});

This operations takes 1.2 seconds compared to 18 seconds for normally retrieving the same amount of data.

Tuples

I found that in my case using Tuples instead of anonymous types improves performance a little, the following query executed roughly 30% faster:

var query = db.DataPoints.Select(dp => Tuple.Create(dp.sensorID, dp.TimeStamp, dp.Value));

Other ways

  1. You cannot use structures inside LinQ queries, so that's not an option
  2. In many cases you can combine many records together to reduce overhead associated with retrieving many individual records. By retrieving fewer larger records you could improve performance. For instance in my usecase I've got some measurements that are being taken every 5 minutes, 24/7. At the moment I am storing them individually, and that's silly. Nobody will ever query less than a day worth of them. I plan to update this post when I make the change and find out how performance changed.
  3. Some recommend using an object oriented DB or micro ORM. I have never used either, so I can't comment.

you can use a different technique to load all your items.

you can create your own logic to load parts of the data while the user is scrolling the ListView( I guess you are using it) .

fortunately UWP a easy way to do this technique. Incremental loading please see the documentation and example

https://msdn.microsoft.com/library/windows/apps/Hh701916