Does Laravel's "soft_delete" need index on MySQL?
The column deleted_at
is not a good index candidate. I'll try to explain better compared to the comment: indexes are useful only when their cardinality is relatively high. Cardinality is a number that describes index uniqueness in the data-set. That means it's total number of records divided by total unique records.
For example, the cardinality of primary key is 1. Every record contains unique value for primary key. 1 is also, the highest number. You can consider it as a "100%".
But, a column such as deleted_at doesn't have such a value. What Laravel does with deleted_at is check whether it is or isn't null. That means it has two possible values. Columns that contain two values have extremely low cardinality which decreases as number of records goes up.
You can index such a column, but it won't be of any help. What will happen is that it could slow things down and take up space.
TL;DR: no, you don't have to index that column, index will have no beneficial impact on performance.
I don't know why the above by @N.B. has so many upvotes, in my context I find this completely incorrect.
I added indexes to deleted_at timestamps on some keys tables and enjoyed some queries dropping from 32s to less than 5.4ms. It really depends on the nature of of your app.
In my scenario, I have 3 tables with soft deletes, some simple joins (all with indexes), but my queries suffered due to the default nature of Laravel's handling of soft deletes.
I highly recommended indexing these columns so your app does choke when record numbers elevate.
Short Answer: Maybe.
Long Answer:
If there are very few different values in deleted_at
, MySQL will not use INDEX(deleted_at)
.
If there are a not of different non-null dates in deleted_at
, MySQL will use INDEX(deleted_at)
.
Most of the discussion (so far) has failed to bring the cardinality of this single-column index into consideration.
Note: This is not the same as a 2-value flag such as is_deleted
. It is useless to have a single-column index on such.
More discussion (from MySQL point of view)
https://laravel.com/docs/5.2/eloquent#soft-deleting says
Now, when you call the delete method on the model, the deleted_at column will be set to the current date and time. And, when querying a model that uses soft deletes, the soft deleted models will automatically be excluded from all query results.
From that, I assume this is occurring in the table definition:
deleted_at DATETIME NULL -- (or TIMESTAMP NULL)
And the value is initialized (explicitly or implicitly) to NULL
.
Case 1: Lots of new rows, none yet 'deleted': All the deleted_at
values are NULL
. In this case, the Optimizer will shun INDEX(deleted_at)
as not helping. In fact using the index would hurt because it would cost more to go through the entire index and the data. It would be cheaper to ignore the index and simply assume all rows are candidates for being SELECTed
.
Case 2: A few rows (out of many) have been deleted: Now deleted_at
has multiple values. Although Laravel only cares about IS NULL
vs IS NOT NULL
, MySQL sees it as a multi-valued column. But, since the test is for IS NULL
and most rows are still NULL
, the Optimizer's reaction is the same as for Case 1.
Case 3: A lot more rows are soft-deleted than still active: Now the index has suddenly become useful because only a small percentage of the table IS NULL
.
There is no exact cutoff between Case 2 and Case 3. 20% is a handy Rule of Thumb.
Now, from the execution point of view.
INDEX(deleted_at)
used for deleted_at IS NULL
:
- Drill down the Index BTree for the first row with
NULL
. - Scan until
IS NULL
fails. - For each matching row, reach over into the data BTree to get the row.
INDEX(deleted_at)
is not used:
- Scan the data BTree (or use some other index)
- For each data row, check that
deleted_at IS NULL
, else filter out that row.
Composite index:
It may be very beneficial to have a "composite" (multi-column) index starting with deleted_at
. Example:
INDEX(deleted_at, foo)
WHERE deleted_at IS NULL
AND foo BETWEEN 111 AND 222
This is very likely to use the index effectively regardless of what percentage of the table has deleted_at IS NULL
.
- Drill down the Index BTree for the first row with
NULL
andfoo >= 111
. - Scan until
IS NULL
orfoo <= 222
fails. - For each matching row, reach over into the data BTree to get the row.
Note that in an INDEX
, NULL
acts very much like any other single value. (And NULLs
are stored before other values.)