LINQ Aggregate algorithm explained

A picture is worth a thousand words

Reminder:
Func<X, Y, R> is a function with two inputs of type X and Y, that returns a result of type R.

Enumerable.Aggregate has three overloads:


Overload 1:

A Aggregate<A>(IEnumerable<A> a, Func<A, A, A> f)

Aggregate1

Example:

new[]{1,2,3,4}.Aggregate((x, y) => x + y);  // 10


This overload is simple, but it has the following limitations:

  • the sequence must contain at least one element,
    otherwise the function will throw an InvalidOperationException.
  • elements and result must be of the same type.



Overload 2:

B Aggregate<A, B>(IEnumerable<A> a, B bIn, Func<B, A, B> f)

Aggregate2

Example:

var hayStack = new[] {"straw", "needle", "straw", "straw", "needle"};
var nNeedles = hayStack.Aggregate(0, (n, e) => e == "needle" ? n+1 : n);  // 2


This overload is more general:

  • a seed value must be provided (bIn).
  • the collection can be empty,
    in this case, the function will yield the seed value as result.
  • elements and result can have different types.



Overload 3:

C Aggregate<A,B,C>(IEnumerable<A> a, B bIn, Func<B,A,B> f, Func<B,C> f2)


The third overload is not very useful IMO.
The same can be written more succinctly by using overload 2 followed by a function that transforms its result.


The illustrations are adapted from this excellent blogpost.


Super short Aggregate works like fold in Haskell/ML/F#.

Slightly longer .Max(), .Min(), .Sum(), .Average() all iterates over the elements in a sequence and aggregates them using the respective aggregate function. .Aggregate () is generalized aggregator in that it allows the developer to specify the start state (aka seed) and the aggregate function.

I know you asked for a short explaination but I figured as others gave a couple of short answers I figured you would perhaps be interested in a slightly longer one

Long version with code One way to illustrate what does it could be show how you implement Sample Standard Deviation once using foreach and once using .Aggregate. Note: I haven't prioritized performance here so I iterate several times over the colleciton unnecessarily

First a helper function used to create a sum of quadratic distances:

static double SumOfQuadraticDistance (double average, int value, double state)
{
    var diff = (value - average);
    return state + diff * diff;
}

Then Sample Standard Deviation using ForEach:

static double SampleStandardDeviation_ForEach (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var state = seed;
    foreach (var value in ints)
    {
        state = SumOfQuadraticDistance (average, value, state);
    }
    var sumOfQuadraticDistance = state;

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Then once using .Aggregate:

static double SampleStandardDeviation_Aggregate (
    this IEnumerable<int> ints)
{
    var length = ints.Count ();
    if (length < 2)
    {
        return 0.0;
    }

    const double seed = 0.0;
    var average = ints.Average ();

    var sumOfQuadraticDistance = ints
        .Aggregate (
            seed,
            (state, value) => SumOfQuadraticDistance (average, value, state)
            );

    return Math.Sqrt (sumOfQuadraticDistance / (length - 1));
}

Note that these functions are identical except for how sumOfQuadraticDistance is calculated:

var state = seed;
foreach (var value in ints)
{
    state = SumOfQuadraticDistance (average, value, state);
}
var sumOfQuadraticDistance = state;

Versus:

var sumOfQuadraticDistance = ints
    .Aggregate (
        seed,
        (state, value) => SumOfQuadraticDistance (average, value, state)
        );

So what .Aggregate does is that it encapsulates this aggregator pattern and I expect that the implementation of .Aggregate would look something like this:

public static TAggregate Aggregate<TAggregate, TValue> (
    this IEnumerable<TValue> values,
    TAggregate seed,
    Func<TAggregate, TValue, TAggregate> aggregator
    )
{
    var state = seed;

    foreach (var value in values)
    {
        state = aggregator (state, value);
    }

    return state;
}

Using the Standard deviation functions would look something like this:

var ints = new[] {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
var average = ints.Average ();
var sampleStandardDeviation = ints.SampleStandardDeviation_Aggregate ();
var sampleStandardDeviation2 = ints.SampleStandardDeviation_ForEach ();

Console.WriteLine (average);
Console.WriteLine (sampleStandardDeviation);
Console.WriteLine (sampleStandardDeviation2);

IMHO

So does .Aggregate help readability? In general I love LINQ because I think .Where, .Select, .OrderBy and so on greatly helps readability (if you avoid inlined hierarhical .Selects). Aggregate has to be in Linq for completeness reasons but personally I am not so convinced that .Aggregate adds readability compared to a well written foreach.


It partly depends on which overload you're talking about, but the basic idea is:

  • Start with a seed as the "current value"
  • Iterate over the sequence. For each value in the sequence:
    • Apply a user-specified function to transform (currentValue, sequenceValue) into (nextValue)
    • Set currentValue = nextValue
  • Return the final currentValue

You may find the Aggregate post in my Edulinq series useful - it includes a more detailed description (including the various overloads) and implementations.

One simple example is using Aggregate as an alternative to Count:

// 0 is the seed, and for each item, we effectively increment the current value.
// In this case we can ignore "item" itself.
int count = sequence.Aggregate(0, (current, item) => current + 1);

Or perhaps summing all the lengths of strings in a sequence of strings:

int total = sequence.Aggregate(0, (current, item) => current + item.Length);

Personally I rarely find Aggregate useful - the "tailored" aggregation methods are usually good enough for me.


The easiest-to-understand definition of Aggregate is that it performs an operation on each element of the list taking into account the operations that have gone before. That is to say it performs the action on the first and second element and carries the result forward. Then it operates on the previous result and the third element and carries forward. etc.

Example 1. Summing numbers

var nums = new[]{1,2,3,4};
var sum = nums.Aggregate( (a,b) => a + b);
Console.WriteLine(sum); // output: 10 (1+2+3+4)

This adds 1 and 2 to make 3. Then adds 3 (result of previous) and 3 (next element in sequence) to make 6. Then adds 6 and 4 to make 10.

Example 2. create a csv from an array of strings

var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate( (a,b) => a + ',' + b);
Console.WriteLine(csv); // Output a,b,c,d

This works in much the same way. Concatenate a a comma and b to make a,b. Then concatenates a,b with a comma and c to make a,b,c. and so on.

Example 3. Multiplying numbers using a seed

For completeness, there is an overload of Aggregate which takes a seed value.

var multipliers = new []{10,20,30,40};
var multiplied = multipliers.Aggregate(5, (a,b) => a * b);
Console.WriteLine(multiplied); //Output 1200000 ((((5*10)*20)*30)*40)

Much like the above examples, this starts with a value of 5 and multiplies it by the first element of the sequence 10 giving a result of 50. This result is carried forward and multiplied by the next number in the sequence 20 to give a result of 1000. This continues through the remaining 2 element of the sequence.

Live examples: http://rextester.com/ZXZ64749
Docs: http://msdn.microsoft.com/en-us/library/bb548651.aspx


Addendum

Example 2, above, uses string concatenation to create a list of values separated by a comma. This is a simplistic way to explain the use of Aggregate which was the intention of this answer. However, if using this technique to actually create a large amount of comma separated data, it would be more appropriate to use a StringBuilder, and this is entirely compatible with Aggregate using the seeded overload to initiate the StringBuilder.

var chars = new []{"a","b","c", "d"};
var csv = chars.Aggregate(new StringBuilder(), (a,b) => {
    if(a.Length>0)
        a.Append(",");
    a.Append(b);
    return a;
});
Console.WriteLine(csv);

Updated example: http://rextester.com/YZCVXV6464