What construction can I use instead of Contains?

You could create a temporary database table which represents myList and refactor your query to a JOIN with that temporary List.

The reason for the error is that the actual query produced contains all elements of myList.

Basically the DB (the query processor) needs to see both lists to do the filtering. If the second list is too large to fit inside the query you have to provide it otherwise (for example as a temp table)

If you wan't this to perform well I'd suggest you use table valued parameters and a stored procedure.

in your database, using TSQL,

CREATE TYPE [dbo].[IdSet] AS TABLE
(
    [Id] INT
);
GO

CREATE PROCEDURE [dbo].[Get<table>]
    @ids [dbo].[IdSet] READONLY
AS
    SET NOCOUNT ON;

    SELECT
                <Column List>
        FROM
                [dbo].[<table>] [T]
        WHERE
                [T].[Id] IN (SELECT [Id] FROM @ids);
RETURN 0;
GO

Then, in C#

var ids = new DataTable()
ids.Columns.Add("Id", typeof(int));

foreach (var id in myList)
{
    ids.Rows.Add(id);
}

var objList = myContext.SqlQuery<<entity>>(
    "[dbo].[Get<table>] @ids",
    new SqlParameter("@ids", SqDbType.Structured)
        { 
            Value = ids,
            TypeName = "[dbo].[IdSet]"
        }));

You could split the list in several sub-lists, and run separate queries:

int start = 0;
int count = 0;
const int chunk_size = 1000;
do {
    count = Math.Min(chunk_size, myList.Count - start);
    var tmpList = myList.GetRange(start, count);
    // run query with tmpList
    var objList= myContext.MyObjects.Where(t => tmpList.Contains(t.Id)).ToList();
    // do something with results...
    start += count;
} while (start < myList.Count);

Of course, you need to find out the good "chunk size" in some way that works for you. Depending on the size of the table and of the list, it might be more convenient to load the entire table and filter in the code, as suggested in other answers.

You can perform the query on the client side by adding AsEnumerable() to "hide" the Where clause from Entity Framework:

var objList = myContext
  .MyObjects
  .AsEnumerable()
  .Where(t => myList.Contains(t.Id))
  .ToList();

To improve performance you can replace the list with a HashSet:

var myHashSet = new HashSet<int>(myList);

and then modify the predicate in Where accordingly:

  .Where(t => myHashSet.Contains(t.Id))

This is the "easy" solution in terms of time to implement. However, because the query is running client side you may get poor performance because all MyObjects rows are pulled to the client side before they are filtered.

The reason you get the error is because Entity Framework converts you query into something like this:

SELECT ...
FROM ...
WHERE column IN (ID1, ID2, ... , ID8000)

So bascially all 8000 ID's from the list is included in the generated SQL which exceeds the limit of what SQL Server can handle.

What Entity Framework "looks for" to generate this SQL is ICollection<T> which is implemented by both List<T> and HashSet<T> so if you try to keep the query on the server side you get no improved performance by using HashSet<T>. However, on the client side the story is different where Contains is O(1) for HashSet<T> and O(N) for List<T>.

What construction can I use instead of Contains?

Tags:

C#

Linq

Sql Server 2008 R2

Related

Recent Posts