Does Distinct() method keep original ordering of sequence intact?

In the .NET Framework 3.5, disassembling the CIL of the Linq-to-Objects implementation of Distinct() shows that the order of elements is preserved - however this is not documented behavior.

I did a little investigation with Reflector. After disassembling System.Core.dll, Version=3.5.0.0 you can see that Distinct() is an extension method, which looks like this:

public static class Emunmerable
{
    public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
    {
        if (source == null)
            throw new ArgumentNullException("source");

        return DistinctIterator<TSource>(source, null);
    }
}

So, interesting here is DistinctIterator, which implements IEnumerable and IEnumerator. Here is simplified (goto and lables removed) implementation of this IEnumerator:

private sealed class DistinctIterator<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
    private bool _enumeratingStarted;
    private IEnumerator<TSource> _sourceListEnumerator;
    public IEnumerable<TSource> _source;
    private HashSet<TSource> _hashSet;    
    private TSource _current;

    private bool MoveNext()
    {
        if (!_enumeratingStarted)
        {
            _sourceListEnumerator = _source.GetEnumerator();
            _hashSet = new HashSet<TSource>();
            _enumeratingStarted = true;
        }

        while(_sourceListEnumerator.MoveNext())
        {
            TSource element = _sourceListEnumerator.Current;

             if (!_hashSet.Add(element))
                 continue;

             _current = element;
             return true;
        }

        return false;
    }

    void IEnumerator.Reset()
    {
        throw new NotSupportedException();
    }

    TSource IEnumerator<TSource>.Current
    {
        get { return _current; }
    }

    object IEnumerator.Current
    {        
        get { return _current; }
    }
}

As you can see - enumerating goes in order provided by source enumerable (list, on which we are calling Distinct). Hashset is used only for determining whether we already returned such element or not. If not, we are returning it, else - continue enumerating on source.

So, it is guaranteed, that Distinct() will return elements exactly in same order, which are provided by collection to which Distinct was applied.


It's not guaranteed, but it's the most obvious implementation. It would be hard to implement in a streaming manner (i.e. such that it returned results as soon as it could, having read as little as it could) without returning them in order.

You might want to read my blog post on the Edulinq implementation of Distinct().

Note that even if this were guaranteed for LINQ to Objects (which personally I think it should be) that wouldn't mean anything for other LINQ providers such as LINQ to SQL.

The level of guarantees provided within LINQ to Objects is a little inconsistent sometimes, IMO. Some optimizations are documented, others not. Heck, some of the documentation is flat out wrong.