C# how to yield return SelectMany?
There are some misconceptions in your question, which is awesome because now you have an opportunity to learn facts rather than myths.
First off, the method you are implementing is usually called CartesianProduct
, not GetAllPossibleCombos
, so consider renaming it.
Perhaps I am not understanding this correctly
You are not understanding it correctly.
doesn't this build the entire combos list in RAM?
No. A query builder builds a query, not the results of executing the query. When you do a SelectMany
, what you get is an object that will do the selection in the future. You don't get the results of that selection.
If there are a large number of items the method might cause the computer to run out of RAM.
Today would be a good day to stop thinking of memory and RAM as the same thing. When a process runs out of memory, it does not run out of RAM. It runs out of address space, which is not RAM. The better way to think about memory is: memory is on-disk page file, and RAM is special hardware that makes your page file faster. When you run out of RAM, your machine might get unacceptably slow, but you don't run out of memory until you run out of address space. Remember, process memory is virtualized.
Now, there may be scenarios in which executing this code is inefficient because enumerating the query runs out of stack. And there may be scenarios in which execution becomes inefficient because you're moving n items up a stack n deep. I suggest that you to do a deeper analysis of your code and see if that is the case, and report back.
Is there a way to re-write the method to use a yield return on each combo, instead of returning the entire combos set?
SelectMany
is implemented as a yield return
in a foreach
loop, so you've already implemented it as a yield return
on each combo; you've just hidden the yield return
inside a call to SelectMany
.
That is, SelectMany<A, B, C>(IE<A> items, Func<A, IE<B>> f, Func<A, B, C> g)
is implemented as something like:
foreach(A a in items)
foreach(B b in f(a))
yield return g(a, b);
So you've already done it in yield return
.
If you want to write a method that directly does a yield return
that's a little harder; the easiest way to do that is to form an array of enumerators on each child sequence, then make a vector from each Current
of the enumerators, yield return
the vector, and then advance the correct iterator one step. Keep on doing that until there is no longer a correct iterator to advance.
As you can probably tell from that description, the bookkeeping gets messy. It is doable, but it's not very pleasant code to write. Give it a try though! The nice thing about that solution is that you are guaranteed to have good performance because you're not consuming any stack.
UPDATE: This related question has an answer posted that does an iterative algorithm, but I have not reviewed it to see if it is correct. https://stackoverflow.com/a/57683769/88656
Finally, I encourage you to compare your implementation to mine:
https://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/
Is my implementation in any way fundamentally different than yours, or are we doing the same thing, just using slightly different syntax? Give that some thought.
Also I encourage you to read Ian Griffiths' excellent six-part series on an analysis of various implementations of this function:
http://www.interact-sw.co.uk/iangblog/2010/07/28/linq-cartesian-1