SQL Server Linked Server performance: Why are remote queries so expensive?
The plan you have at the moment looks like the most optimal plan to me.
I don't agree with the assertion in the other answers that it is sending the 2.6M rows to the remote server.
The plan looks to me as though for each of the 54 rows returned from the remote query it is performing an index seek into your local table to determine whether it is matched or not. This is pretty much the optimal plan.
Replacing with a hash join or merge join would be counterproductive given the size of table and adding an intermediate #temp
table just adds an additional step that doesn't seem to give you any advantage.
Connecting to a remote resource is expensive. Period.
One of the most expensive operations in any programming environment is network IO (though disk IO tends to dwarf it).
This extends to remote linked servers. The server calling the remote linked server needs to first establish a connection, then a query needs to be executed on the remote server, results returned and the connection closed. This all takes time over the network.
You should also structure your query in such a way that you transfer the minimum data across the wire. Don't expect the DB to optimize for you.
If I were to write this query, I would select the remote data into a table variable (or into a temp table) and then use this in conjunction with the local table. This ensures that only data that needs to be transferred will.
The query you are running can easily be sending 2.6M rows to the remote server in order to process the EXCEPT
clause.