Why is there a sort showing up in my execution plan?

I think the sort is occurring for this join:

FROM   custOrders c 
       INNER JOIN files f 
               ON c.orderid = f.orderid

I would create an index on files that includes the columns orderid and statusid since the query also uses the statusid column.

You might also want to consider the following changes:

You don't need "ac.adminuserid IS NOT NULL" as this is covered by the inner join between adminusers and admincustomers
Change the test "f.statusid NOT IN ( 5, 6 )" to a positive condition (e.g. In) as negative conditions are more expensive to process.

I know this question is quite old, however I had this same issue and realised there was a completely different reason my tables suddenly slowed. The symptoms were the same, slow to update views that were previously lightning fast. "Sort" giving a cost of 40%. This solution may prove useful to someone, and it is simple. When joining tables, ensure you are joining on a "like for like" basis. I was joining two tables on ID. However in one table my ID was set as an int and in the other as nvarchar. I corrected this to have them both defined as the same type and the view is back to lightning speed.

hopefully this will help someone else to avoid spending a week trying to figure out what's wrong with SQL, when its really a PEBKAC moment.

(Problem Exists Between Keyboard And Chair)

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.

Since the PK_Orders key is already ordered by OrderID, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.

The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files table? An index on files.statusid may be helpful if the records not in 5,6 are less than 10% of the total table size.

The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.

EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.

EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:

c.CustomerId=o.CustomerId
OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL

Looks like SQL Server is producing a cross join between all possible matching records between Orders and Customers up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.

My first thought is to rephrase the CTE custOrders as a union of two datasets: one using CustomerId and one using SecondaryCustomerId to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.

In SQL Server execution plans a Sort-Merge is split into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.

For more information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/

To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better off. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help your query performance too.

Why is there a sort showing up in my execution plan?

Tags:

Performance

Sql Server

Sql Server 2008

Related

Recent Posts