What is the correct way to use async/await in a recursive method?

When I add code to make your example more concrete, I find two possible ways for the recursion to turn out badly. Both of them assume that your data is pretty big and require specific conditions to trigger.

If ProcessItem(string) returns a Task that completes before it is awaited on (or, I assume, it completes before the await finishes spinning), the continuation will execute synchronously. In my code below, I have simulated this by having ProcessItem(string) return Task.CompletedTask. When I do this, the program very quickly terminates with a StackOverflowException. This is because .net’s TPL “Releases Zalgo” by opportunistically executing continuations synchronously without regard to how much space is available in the current stack. That means that it will exacerbate the potential stack space issue that you already have by using a recursive algorithm. To see this, comment out await Task.Yield(); in my code sample below.
If you use some technique to prevent TPL from continuing synchronously (below I use Task.Yield()), eventually the program will run out of memory and die with an OutOfMemoryException. If I understand correctly, this wouldn’t happen if return await were able to emulate the tail-call optimization. I imagine that what is happening here is each call generates something like a book-keeping Task<string> and keeps generating them even though they could be coalesced. To reproduce this error with the sample below, ensure you’re running the program as 32-bit, disable the Console.WriteLine() call (because consoles are really slow), and ensure the await Task.Yield() is uncommented.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

// Be sure to run this 32-bit to avoid making your system unstable.
class StreamProcessor
{
    Stream GetStream(string streamPosition)
    {
        var parsedStreamPosition = Convert.ToInt32(streamPosition);
        return new Stream(
            // Terminate after we reach 0.
            parsedStreamPosition > 0 ? new[] { streamPosition, } : new string[] { },
            Convert.ToString(parsedStreamPosition - 1));
    }

    Task ProcessItem(string item)
    {
        // Comment out this next line to make things go faster.
        Console.WriteLine(item);
        // Simulate the Task represented by ProcessItem finishing in
        // time to make the await continue synchronously.
        return Task.CompletedTask;
    }

    public async Task<string> ProcessStream(string streamPosition)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        // Without this yield (which prevents inline synchronous
        // continuations which quickly eat up the stack),
        // you get a StackOverflowException fairly quickly.
        // With it, you get an OutOfMemoryException eventually—I bet
        // that “return await” isn’t able to tail-call properly at the Task
        // level or that TPL is incapable of collapsing a chain of Tasks
        // which are all set to resolve to the value that other tasks
        // resolve to?
        await Task.Yield();

        return await ProcessStream(stream.NextPosition);
    }
}

class Program
{
    static int Main(string[] args) => new Program().Run(args).Result;
    async Task<int> Run(string[] args)
    {
        await new StreamProcessor().ProcessStream(
            Convert.ToString(int.MaxValue));
        return 0;
    }
}

class Stream
{
    public IList<string> Items { get; }
    public string NextPosition { get; }
    public Stream(
        IList<string> items,
        string nextPosition)
    {
        Items = items;
        NextPosition = nextPosition;
    }
}

So, I guess my two recommendations are:

Use Task.Yield() if you aren’t certain that the stack growth of recursion will be interrupted by something else.
As suggested already, avoid recursion if it doesn’t even make sense for your problem in the first place. And even if it makes a clean algorithm, avoid it if your problem size is unbounded.

While I have to say upfront that the intention of the method is not entirely clear to me, reimplementing it with a simple loop is quite trivial:

public async Task<string> ProcessStream(string streamPosition)
{
    while (true)
    {
        var stream = GetStream(streamPosition);

        if (stream.Items.Count == 0)
            return stream.NextPosition;

        foreach (var item in stream.Items)
        {
            await ProcessItem(item); //ProcessItem() is now an async method
        }

        streamPosition = stream.NextPosition;
    }
}

Recursion is not stack-friendly and if you have the option of using a loop, it's something definitely worth looking into in simple synchronous scenarios (where poorly controlled recursion eventually leads to StackOverflowExceptions), as well as asynchronous scenarios, where, I'll be honest, I don't even know what would happen if you push things too far (my VS Test Explorer crashes whenever I try to reproduce known stack overflow scenarios with async methods).

Answers such as Recursion and the await / async Keywords suggest that StackOverflowException is less of a problem with async due to the way the async/await state machine works, but this is not something I have explored much as I tend to avoid recursion whenever possible.

What is the correct way to use async/await in a recursive method?

Tags:

C#

.Net

Recursion

Asynchronous

Async Await

Related

Recent Posts