Native JSON support in MYSQL 5.7 : what are the pros and cons of JSON data type in MYSQL?
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Using a column inside an expression or function like this spoils any chance of the query using an index to help optimize the query. The query shown above is forced to do a table-scan.
The claim about "efficient access" is misleading. It means that after the query examines a row with a JSON document, it can extract a field without having to parse the text of the JSON syntax. But it still takes a table-scan to search for rows. In other words, the query must examine every row.
By analogy, if I'm searching a telephone book for people with first name "Bill", I still have to read every page in the phone book, even if the first names have been highlighted to make it slightly quicker to spot them.
MySQL 5.7 allows you to define a virtual column in the table, and then create an index on the virtual column.
ALTER TABLE t1
ADD COLUMN series AS (JSON_EXTRACT(data, '$.series')),
ADD INDEX (series);
Then if you query the virtual column, it can use the index and avoid the table-scan.
SELECT * FROM t1
WHERE series IN ...
This is nice, but it kind of misses the point of using JSON. The attractive part of using JSON is that it allows you to add new attributes without having to do ALTER TABLE. But it turns out you have to define an extra (virtual) column anyway, if you want to search JSON fields with the help of an index.
But you don't have to define virtual columns and indexes for every field in the JSON document—only those you want to search or sort on. There could be other attributes in the JSON that you only need to extract in the select-list like the following:
SELECT JSON_EXTRACT(data, '$.series') AS series FROM t1
WHERE <other conditions>
I would generally say that this is the best way to use JSON in MySQL. Only in the select-list.
When you reference columns in other clauses (JOIN, WHERE, GROUP BY, HAVING, ORDER BY), it's more efficient to use conventional columns, not fields within JSON documents.
I presented a talk called How to Use JSON in MySQL Wrong at the Percona Live conference in April 2018. I'll update and repeat the talk at Oracle Code One in the fall.
There are other issues with JSON. For example, in my tests it required 2-3 times as much storage space for JSON documents compared to conventional columns storing the same data.
MySQL is promoting their new JSON capabilities aggressively, largely to dissuade people against migrating to MongoDB. But document-oriented data storage like MongoDB is fundamentally a non-relational way of organizing data. It's different from relational. I'm not saying one is better than the other, it's just a different technique, suited to different types of queries.
You should choose to use JSON when JSON makes your queries more efficient.
Don't choose a technology just because it's new, or for the sake of fashion.
Edit: The virtual column implementation in MySQL is supposed to use the index if your WHERE clause uses exactly the same expression as the definition of the virtual column. That is, the following should use the index on the virtual column, since the virtual column is defined AS (JSON_EXTRACT(data,"$.series"))
SELECT * FROM t1
WHERE JSON_EXTRACT(data,"$.series") IN ...
Except I have found by testing this feature that it does NOT work for some reason if the expression is a JSON-extraction function. It works for other types of expressions, just not JSON functions.
The following from MySQL 5.7 brings sexy back with JSON sounds good to me:
Using the JSON Data Type in MySQL comes with two advantages over storing JSON strings in a text field:
Data validation. JSON documents will be automatically validated and invalid documents will produce an error. Improved internal storage format. The JSON data is converted to a format that allows quick read access to the data in a structured format. The server is able to lookup subobjects or nested values by key or index, allowing added flexibility and performance.
...
Specialised flavours of NoSQL stores (Document DBs, Key-value stores and Graph DBs) are probably better options for their specific use cases, but the addition of this datatype might allow you to reduce complexity of your technology stack. The price is coupling to MySQL (or compatible) databases. But that is a non-issue for many users.
Note the language about document validation as it is an important factor. I guess a battery of tests need to be performed for comparisons of the two approaches. Those two being:
- Mysql with JSON datatypes
- Mysql without
The net has but shallow slideshares as of now on the topic of mysql / json / performance from what I am seeing.
Perhaps your post can be a hub for it. Or perhaps performance is an after thought, not sure, and you are just excited to not create a bunch of tables.
I got into this problem recently, and I sum up the following experiences:
1, There isn't a way to solve all questions. 2, You should use the JSON properly.
One case:
I have a table named: CustomField
, and it must two columns: name
, fields
.
name
is a localized string, it content should like:
{
"en":"this is English name",
"zh":"this is Chinese name"
...(other languages)
}
And fields
should be like this:
[
{
"filed1":"value",
"filed2":"value"
...
},
{
"filed1":"value",
"filed2":"value"
...
}
...
]
As you can see, both the name
and the fields
can be saved as JSON, and it works!
However, if I use the name
to search this table very frequently, what should I do? Use the JSON_CONTAINS
,JSON_EXTRACT
...? Obviously, it's not a good idea to save it as JSON anymore, we should save it to an independent table:CustomFieldName
.
From the above case, I think you should keep these ideas in mind:
- Why MYSQL support JSON?
- Why you want to use JSON? Did your business logic just need this? Or there is something else?
- Never be lazy
Thanks