Is left hash join always better than left outer join?
Semantically, both queries are the same. The LOOP
versus the HASH
simply tells SQL Server which option to use to return results. If you run the query without LOOP
or HASH
, SQL Server may pick either of those options, depending on which it thinks will perform best. Both options return identical results by design. To put it slightly differently1, the keywords HASH
and LOOP
in this case are what is known as join hints, one of the three kinds of hints in Transact-SQL. To confuse you even more, you can also specify a join hint as a query hint, though the effect is not necessarily the same in both cases. Be aware that providing these kinds of join hints implies a FORCE ORDER
hint, which specifies that the join order indicated by the query syntax is preserved during query optimization (see Paul's answer below for further details).
SQL Server uses statistics on each table in the query to make an informed choice about what kind of physical operation to take with for each JOIN
in the T-SQL query statement.
In this case, since [ExternalTable]
is a view referenced through a linked server, SQL Server probably expects there to be 1 row in the table - i.e. it has no idea how many rows to expect.
You can either add a join hint to your query to force a merge join, or simply copy rows from [ExternalTable]
into a local #temp table with a clustered index, then run the query against that.
The full syntax for the hash join would be:
LEFT OUTER HASH JOIN [ABC].[ExternalTable] s ON s.foot = t.foo .....
The version with the HASH
in the join statement is allowing SQL Server to choose the join type, and in your particular case it's most likely choosing a LOOP join, which you could force with:
LEFT OUTER LOOP JOIN [ABC].[ExternalTable] s ON s.foot = t.foo .....
I typically would not recommend specifying the join type since most of the time SQL Server is very capable of choosing the most appropriate style of join operator.
1 - thanks Andriy for the wording here.
I'd just like to add a bit to the other answers and comments here:
You are comparing apples to oranges.
OUTER is a logical join operator. It specifies that you have a side from which you want to preserve rows. Hence the need to say LEFT or RIGHT. It is OK to leave out the word OUTER, meaning LEFT JOIN is the same as saying LEFT OUTER JOIN.
HASH is a directive to SQL Server specifying how to perform whatever join you asked for (left join, for instance). We avoid hints in general, leaving it up to the optimizer to decide how to do it. The optimizer in turn relies on things like statistics to try to come up with the best way to perform what you asked for. One disadvantage of overriding the optimizer is that the optimizer has less freedom to adjust to when things changes, like more data in one of the tables, or that you added or removed an index.
So, saying LEFT HASH JOIN is the very same as saying LEFT OUTER HASH JOIN. The word OUTER is optional.
Is there something that I should look out for when running a
LEFT HASH JOIN
instead of aLEFT OUTER JOIN
?
Yes. Using a join hint like LEFT HASH JOIN
forces the join order for tables specified in the query. It forces the optimizer to join the tables in textual order, just as if you had also added OPTION (FORCE ORDER)
.
Using a join hint has all the same effects as FORCE ORDER
, including disabling the repositioning of aggregates, and introducing partial aggregates.
Be extremely careful with join hints. They restrict the query optimizer a lot more than most people realize.