Why are ToLookup and GroupBy different?
In simple LINQ-world words:
ToLookup()
- immediate executionGroupBy()
- deferred execution
why would I ever bother with GroupBy? Why should it exist?
What happens when you call ToLookup on an object representing a remote database table with a billion rows in it?
The billion rows are sent over the wire, and you build the lookup table locally.
What happens when you call GroupBy on such an object?
A query object is built; end of story.
When that query object is enumerated then the analysis of the table is done on the database server and the grouped results are sent back on demand a few at a time.
Logically they are the same thing but the performance implications of each are completely different. Calling ToLookup means I want a cache of the entire thing right now organized by group. Calling GroupBy means "I am building an object to represent the question 'what would these things look like if I organized them by group?'"
The two are similar, but are used in different scenarios. .ToLookup()
returns a ready to use object that already has all the groups (but not the group's content) eagerly loaded. On the other hand, .GroupBy()
returns a lazy loaded sequence of groups.
Different LINQ providers may have different behaviors for the eager and lazy loading of the groups. With LINQ-to-Object it probably makes little difference, but with LINQ-to-SQL (or LINQ-to-EF, etc.), the grouping operation is performed on the database server rather than the client, and so you may want to do an additional filtering on the group key (which generates a HAVING
clause) and then only get some of the groups instead of all of them. .ToLookup()
wouldn't allow for such semantics since all items are eagerly grouped.