Which method performs better: .Any() vs .Count() > 0?
Note: I wrote this answer when Entity Framework 4 was actual. The point of this answer was not to get into trivial .Any()
vs .Count()
performance testing. The point was to signal that EF is far from perfect. Newer versions are better... but if you have part of code that's slow and it uses EF, test with direct TSQL and compare performance rather than relying on assumptions (that .Any()
is ALWAYS faster than .Count() > 0
).
While I agree with most up-voted answer and comments - especially on the point Any
signals developer intent better than Count() > 0
- I've had situation in which Count is faster by order of magnitude on SQL Server (EntityFramework 4).
Here is query with Any
that thew timeout exception (on ~200.000 records):
con = db.Contacts.
Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
&& !a.NewsletterLogs.Any(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr)
).OrderBy(a => a.ContactId).
Skip(position - 1).
Take(1).FirstOrDefault();
Count
version executed in matter of milliseconds:
con = db.Contacts.
Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
&& a.NewsletterLogs.Count(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) == 0
).OrderBy(a => a.ContactId).
Skip(position - 1).
Take(1).FirstOrDefault();
I need to find a way to see what exact SQL both LINQs produce - but it's obvious there is a huge performance difference between Count
and Any
in some cases, and unfortunately it seems you can't just stick with Any
in all cases.
EDIT: Here are generated SQLs. Beauties as you can see ;)
ANY
:
exec sp_executesql N'SELECT TOP (1) [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created] FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number] FROM ( SELECT [Extent1].[ContactId] AS [ContactId], [Extent1].[CompanyId] AS [CompanyId], [Extent1].[ContactName] AS [ContactName], [Extent1].[FullName] AS [FullName], [Extent1].[ContactStatusId] AS [ContactStatusId], [Extent1].[Created] AS [Created] FROM [dbo].[Contact] AS [Extent1] WHERE ([Extent1].[CompanyId] = @p__linq__0) AND ([Extent1].[ContactStatusId] <= 3) AND ( NOT EXISTS (SELECT 1 AS [C1] FROM [dbo].[NewsletterLog] AS [Extent2] WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId]) )) ) AS [Project2] ) AS [Project2] WHERE [Project2].[row_number] > 99 ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4
COUNT
:
exec sp_executesql N'SELECT TOP (1) [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created] FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number] FROM ( SELECT [Project1].[ContactId] AS [ContactId], [Project1].[CompanyId] AS [CompanyId], [Project1].[ContactName] AS [ContactName], [Project1].[FullName] AS [FullName], [Project1].[ContactStatusId] AS [ContactStatusId], [Project1].[Created] AS [Created] FROM ( SELECT [Extent1].[ContactId] AS [ContactId], [Extent1].[CompanyId] AS [CompanyId], [Extent1].[ContactName] AS [ContactName], [Extent1].[FullName] AS [FullName], [Extent1].[ContactStatusId] AS [ContactStatusId], [Extent1].[Created] AS [Created], (SELECT COUNT(1) AS [A1] FROM [dbo].[NewsletterLog] AS [Extent2] WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])) AS [C1] FROM [dbo].[Contact] AS [Extent1] ) AS [Project1] WHERE ([Project1].[CompanyId] = @p__linq__0) AND ([Project1].[ContactStatusId] <= 3) AND (0 = [Project1].[C1]) ) AS [Project2] ) AS [Project2] WHERE [Project2].[row_number] > 99 ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4
Seems that pure Where with EXISTS works much worse than calculating Count and then doing Where with Count == 0.
Let me know if you guys see some error in my findings. What can be taken out of all this regardless of Any vs Count discussion is that any more complex LINQ is way better off when rewritten as Stored Procedure ;).
Since this is a rather popular topic and answers differ, I had to take a fresh look on the problem.
Testing env: EF 6.1.3, SQL Server, 300k records
Table model:
class TestTable
{
[Key]
public int Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
}
Test code:
class Program
{
static void Main()
{
using (var context = new TestContext())
{
context.Database.Log = Console.WriteLine;
context.TestTables.Where(x => x.Surname.Contains("Surname")).Any(x => x.Id > 1000);
context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Any(x => x.Id > 1000);
context.TestTables.Where(x => x.Surname.Contains("Surname")).Count(x => x.Id > 1000);
context.TestTables.Where(x => x.Surname.Contains("Surname") && x.Name.Contains("Name")).Count(x => x.Id > 1000);
Console.ReadLine();
}
}
}
Results:
Any() ~ 3ms
Count() ~ 230ms for first query, ~ 400ms for second
Remarks:
For my case, EF didn't generate SQL like @Ben mentioned in his post.
The exact details differ a bit in .NET Framework vs .NET Core, but it also somewhat depends on what you're doing: if you're using an ICollection
or ICollection<T>
type (such as with List<T>
) there is a .Count
property that's cheap to access, whereas other types might require enumeration.
TL;DR:
Use .Count > 0
if the property exists, and otherwise .Any()
.
Using .Count() > 0
is never the best option, and in some cases could be dramatically slower.
This applies to both .NET Framework and .NET Core.
Now we can dive into the details..
Lists and Collections
Let's start with a very common case: using List<T>
(which is also ICollection<T>
).
The .Count
property is implemented as:
private int _size;
public int Count {
get {
Contract.Ensures(Contract.Result<int>() >= 0);
return _size;
}
}
What this is saying is _size
is maintained by Add()
,Remove()
etc, and since it's just accessing a field this is an extremely cheap operation -- we don't need to iterate over values.
ICollection
and ICollection<T>
both have .Count
and most types that implement them are likely to do so in a similar way.
Other IEnumerables
Any other IEnumerable
types that aren't also ICollection
require starting enumeration to determine if they're empty or not. The key factor affecting performance is if we end up enumerating a single item (ideal) or the entire collection (relatively expensive).
If the collection is actually causing I/O such as by reading from a database or disk, this could be a big performance hit.
.NET Framework .Any()
In .NET Framework (4.8), the Any()
implementation is:
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return true;
}
return false;
}
This means no matter what, it's going to get a new enumerator object and try iterating once. This is more expensive than calling the List<T>.Count
property, but at least it's not iterating the entire list.
.NET Framework .Count()
In .NET Framework (4.8), the Count()
implementation is (basically):
public static int Count<TSource>(this IEnumerable<TSource> source)
{
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num = checked(num + 1);
}
return num;
}
}
If available, ICollection.Count
is used, but otherwise the collection is enumerated.
.NET Core .Any()
The LINQ Any()
implementation in .NET Core is much smarter. You can see the complete source here but the relevant bits to this discussion:
public static bool Any<TSource>(this IEnumerable<TSource> source)
{
//..snip..
if (source is ICollection<TSource> collectionoft)
{
return collectionoft.Count != 0;
}
//..snip..
using (IEnumerator<TSource> e = source.GetEnumerator())
{
return e.MoveNext();
}
}
Because a List<T>
is an ICollection<T>
, this will call the Count
property (and though it calls another method, there's no extra allocations).
.NET Core .Count()
The .NET Core implementation (source) is basically the same as .NET Framework (see above), and so it will use ICollection.Count
if available, and otherwise enumerates the collection.
Summary
.NET Framework
With
ICollection
:.Count > 0
is best.Count() > 0
is fine, but ultimately just callsICollection.Count
.Any()
is going to be slower, as it enumerates a single item
With non-
ICollection
(no.Count
property).Any()
is best, as it only enumerates a single item.Count() > 0
is bad as it causes complete enumeration
.NET Core
.Count > 0
is best, if available (ICollection
).Any()
is fine, and will either doICollection.Count > 0
or enumerate a single item.Count() > 0
is bad as it causes complete enumeration
If you are starting with something that has a .Length
or .Count
(such as ICollection<T>
, IList<T>
, List<T>
, etc) - then this will be the fastest option, since it doesn't need to go through the GetEnumerator()
/MoveNext()
/Dispose()
sequence required by Any()
to check for a non-empty IEnumerable<T>
sequence.
For just IEnumerable<T>
, then Any()
will generally be quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count()
does check for ICollection<T>
(using .Count
as an optimisation) - so if your underlying data-source is directly a list/collection, there won't be a huge difference. Don't ask me why it doesn't use the non-generic ICollection
...
Of course, if you have used LINQ to filter it etc (Where
etc), you will have an iterator-block based sequence, and so this ICollection<T>
optimisation is useless.
In general with IEnumerable<T>
: stick with Any()
;-p