Determine if an IEnumerable<T> contains any object of another IEnumerable<T>

It would be fastest to use the Any method instead of the Count method:

return x.Intersect<int>(y).Any();

This assumes that the IEnumerable<int> implementation doesn't also implement ICollection<int>. In that case, Count (in the case where IEnumerable<T> implements ICollection<T>) is an O(N) operation while Any is always an O(1) operation. (as it only checks for a single element). However, the behavior of Count is an implementation detail, and you shouldn't rely on that.

I've written about this more in-depth in a blog post that goes into detail about when to use Count() vs. Any(). In summary:

DO use Enumerable.Any extension method to check for the existence of elements in the sequence.
DO NOT use Enumerable.Count extension method in comparisons against zero, as the following are semantically equivalent:
- sequence.Count() == 0
- !sequence.Any()
DO NOT use the Enumerable.Count extension method in comparisons against the “not zero” condition, as the following are semantically equivalent:
- sequence.Count != 0
- sequence.Any()

EDIT: The original answer below is really dealing in terms of complexity. If the sequences are short enough, all the calls through GetNext(), building a HashSet etc will actually be more expensive than using Intersect(y).Any(). However, in that case both calls will be really pretty quick anyway.

In other words, Intersect(y).Any() definitely scales better as the two sequences get longer, but for if you're absolutely sure that the sequences are short, the nested loop solution will be faster.

Original answer

No, Intersect() will be faster than the two-loop solution - but as CasperOne wrote, Any() will be faster than Count() because it will exit as soon as it sees an element.

Assuming sequences of length N and M, Intersect will be O(N + M) whereas the two-loop solution is O(N * M).

Intersect builds up a HashSet from the "inner" sequence (this takes O(M) complexity) and then loops through the outer sequence (which takes O(N) complexity), yielding any element which is in the set. These results are streamed - the inner sequence will be evaluated when the first element is requested from Intersect(), but this only goes as far as finding the first match (if any). Using Any() you'll have an "early out" if there are any matches, so we don't need to take the overall number of matches into consideration when working out the complexity.

Streaming results from LINQ rocks - it's well worth getting your head round (as well as deferred execution).

Intersect is okay, but as others have said I wouldn't call .Count() on the result.

The reason is that Intersect does not create the intersection of the two lists. It creates an IEnumerable that is capable of enumerating that intersection, but it doesn't actually enumerate those results yet. Most of the work is deferred until the time when you finally iterate over this enumeration.

The problem with Count is that it does iterate over the entire enumeration. So not only does it always count all the results, but it causes all of the work involved in computing those results to run as well.

Calling Any instead will be very fast by comparison, because you will compute at most one intersection result before returning. Of course, in the case where there are no matches it will still need to iterate the entire list. However, that's no worse off than you were before. In fact, it's still faster because as Jon Skeet mentioned, the Intersect function uses a HashSet to compute the results rather than iterating over everything. Your best and average cases are improved tremendously.

It's like the difference between these two snippets:

int count = 0;
foreach (int i in x)
{
   foreach (int j in y)
   {
      if (i==j) count++;
   }
}
return (count > 0);

// this one should look familiar
foreach (int i in x)
{
    foreach (int j in y)
    {
       if (i==j) return true;
    }
}
return false;

Obviously the 2nd is much faster on average. The performance of Any() would be analogous (not the same as, thanks to the HashSet) the 2nd snippet, while Count() would be analogous to the first.

Determine if an IEnumerable<T> contains any object of another IEnumerable<T>

Tags:

.Net

Ienumerable

Related

Recent Posts