Linq 'into' keyword confusion

"into" has two different meanings:

  • In a join clause, it changes the translation from using Join to GroupJoin. This means that instead of getting one result per matching pair, you get one result for each element of the original sequence, and that result contains the key and all the results from the other sequence, as a group. See Enumerable.GroupJoin for more details
  • In a select or group...by it becomes a query continuation, effectively starting a new query with the results of the old one in a new range variable.

When used with the select keyword, into will end the scope.
When used with the join keyword, into will add a variable containing all of the matching items from the join. (This is called a Group Join)


To add to what has already been said, I'd like to demonstrate the difference in the object structure produced by into versus without:

var q = 
    from c in categories 
    join p in products on c equals p.Category into ps 
    select new { Category = c, Products = ps }; 

Creates an object graph:

Category 1, Products:
  Product 1
  Product 2
Category 2, Products:
  Product 3
  Product 4

In this case, q only contains 2 items, the two categories.

Without into, you get a more traditional join that flattens the relationship by creating all possible combinations:

var q = 
    from c in categories 
    join p in products on c equals p.Category 
    select new { Category = c, Product = p }; 

Category 1, Product 1
Category 1, Product 2
Category 2, Product 3
Category 2, Product 4

Note that now q contains 4 items.

In response to comment:

var q = 
    from c in categories 
    join p in products on c equals p.Category into ps 
    select new { Category = c, Products = ps.Select(x=> x.Id) }; 

Tags:

Linq