When to use STRAIGHT_JOIN with MySQL

I wouldn't recommend using STRAIGHT_JOIN without a good reason. My own experience is that the MySQL query optimizer chooses a poor query plan more often than I'd like, but not often enough that you should just bypass it in general, which is what you would be doing if you always used STRAIGHT_JOIN.

My recommendation is to leave all queries as regular JOINs. If you discover that one query is using a sub-optimal query plan, I would suggest first trying to rewrite or re-structure the query a bit to see if the optimizer will then pick a better query plan. Also, for innodb at least, make sure it's not just that your index statistics are out-of-date (ANALYZE TABLE). That can cause the optimizer to choose a poor query plan. Optimizer hints should generally be your last resort.

Another reason not to use query hints is that your data distribution may change over time, or your index selectivity may change, etc. as your table grows. Your query hints that are optimal now, may become sub-optimal over time. But the optimizer will be unable to adapt the query plan because of your now outdated hints. You stay more flexible if you allow the optimizer to make the decisions.


Here's a scenario that came up just recently at work.

Consider three tables, A, B, C.

A has 3,000 rows; B has 300,000,000 rows; and C has 2,000 rows.

Foreign keys are defined: B(a_id), B(c_id).

Suppose you had a query that looks like this:

select a.id, c.id
from a
join b on b.a_id = a.id
join c on c.id = b.c_id

In my experience, MySQL may choose to go C -> B -> A in this case. C is smaller than A and B is enormous, and they're all equijoins.

The trouble is MySQL doesn't necessarily take into account the size of the intersection between (C.id and B.c_id) vs (A.id and B.a_id). If the join between B and C returns just as many rows as B, then it's a very poor choice; if starting with A would have filtered down B to as many rows as A, then it would have been a much better choice. straight_join could be used to force this order like this:

select a.id, c.id
from a
straight_join b on b.a_id = a.id
join c on c.id = b.c_id

Now a must be joined on before b.

Generally you want to do your joins in an order that minimizes the number of rows in the resulting set. So starting with a small table and joining such that the resulting join will also be small, is ideal. Things go pear-shaped if starting with a small table and joining it to a bigger table ends up just as large as the big table.

It's stats dependent though. If the data distribution changes, the calculation may change. It's also dependent on the implementation details of the join mechanism.

The worst cases that I've seen for MySQL that all but required straight_join or aggressive index hinting are queries that paginate over a lot of data in a strict sort order with light filtering. MySQL strongly prefers to use indexes for any filters and joins over sorts; this makes sense because most people aren't trying to sort the whole database but rather have a limited subset of rows that are responsive to the query, and sorting a limited subset is much faster than filtering the whole table, no matter whether it's sorted or not. In this case, putting straight join immediately after the table that had the indexed column I wanted to sort on fixed things.

Tags:

Mysql

Join