Mysql Improve Search Performance with wildcards (%%)
In my post here I describe, in detail, a technique that allows you to use the index with LIKE
for fast %infix%
search, at the cost of some extra storage:
https://stackoverflow.com/a/22531268/543814
As long as the strings are relatively small, the storage requirement is generally acceptable.
According to Google, the average e-mail address is 25
characters long. This increases your required storage by a factor 12.5
on average, and gives you fast indexed search in return. (See my post for the calculations.)
From my perspective, if you are storing 10'000 e-mail addresses, you should be fine storing (the equivalent of) about 100'000 e-mail addresses, too. If this is what it takes to allow you to use an index, that seems like an acceptable trade-off. Often, disk space is cheap, while non-indexed searches are unaffordable.
If you choose to take this approach, I suggest that you limit the input length of e-mail addresses to 64
characters. Those rare (or attacker) e-mail addresses of such length will require up to 32
times the usual storage. This gives you:
- Protection against an attacker trying to flood your database, since these still aren't very impressive amounts of data.
- The expectation that most e-mail addresses are not of this length anyway.
If you consider 64
characters too harsh a requirement, use 255
instead, for a worst-case storage increase factor of 127.5
. Ridiculous? Possibly. Likely? No. Fast? Very.
No, Mysql
will not use the index because LIKE
argument (%f%
) starts with the wildcard character %
.
If it starts with a constant, index will be used.
More info: 7.5.3. How MySQL Uses Indexes
Wildcarding the left side of a LIKE
operation ensures that an index, if one exists on the email
column, can not be used.
Full Text Search (FTS) is preferred syntax for finding strings within text via SQL. MySQL has native FTS functionality, using the MATCH/AGAINST syntax (Requires the table to use the MyISAM engine for v.5.5 and below. InnoDB FTS supported on v.5.6+):
SELECT c.*, p.*
FROM PHPPOS_CUSTOMERS c
JOIN PHPPOS_PEOPLE p ON p.person_id = c..person_id
WHERE deleted = 0
AND MATCH(email) AGAINST('f')
ORDER BY email
But there are third party FTS technology, such as Sphinx.
No, because MySQL will not be able to utilize the index when you have a leading wildcard. If you changed your LIKE to 'f%', then it would be able to use the index.