LINQ Include vs Join. Are they equivalent?

An Included is intended to retain the original object structures and graphs. A Join is needed to project a flattened representation of the object graph or to join types which are not naturally related through the graph (ie. join the customer's city with a shipping facility's city).

Compare the following: db.Customers.Include("Orders") Generates an IEnumerable each of which may contain their corresponding list of Orders in an object graph like this:

Customer 1
   Order
   Order
   Order
Customer 2
   Order
   Order

In contrast, if you do the same with a join projecting into an anonymous type you could get the following:

    from c in db.Customers 
    join o in db.Orders on c.CustomerId equals o.CustomerId 
    select new {c, o}

This produces a new IEnumerable<Anonymous<Customer, Order>> where the customer is repeated for each order.

{ Customer1, orderA }
{ Customer1, orderB }
{ Customer1, orderC }
{ Customer2, orderD }
{ Customer2, orderE }
{ Customer2, orderF }

While both may issue the same request to the database, the resulting type may be quite different.


In a sense, yes. Include is implemented as a join. Depending on the nullability of the included link it is an inner or left join.

You can always build an include yourself by using a join, like this:

db.Users.Select(u => new { u, u.City })

This is an "include" for the user's city. It manifests itself as a SQL join.


If you simply need all Orders for some Customers. Good example here for blog application is displaying all Comments below Articles always. Then Include is your way of work.

Join in opposition is more helpful if you need some Customers and filters out them using some data contained in Orders entity. For example you want to sort out Articles to send to the Police Articles with Comments containing vulgar words.

Also if your Orders entity contains a lot of data (many columns) taking a lot of memory and you don't need all of them then join can be much more efficient but here always is a question what "lot of data" or "many columns" means so test first will be the best choice.

Tags:

Linq