MySQL Index Tuning Basics
This is Onikai (@fat47) from the Service Reliability Group (SRG) of the Technology Division.
#SRGThe Service Reliability Group primarily provides comprehensive support for the infrastructure surrounding our media services, focusing on improving existing services, launching new ones, and contributing to open-source software (OSS).
This article is located within SRG.DBWGThe DB Working Group is releasing database-related materials that it provides to the entire company.
I hope this is of some help.
What is an index?Why is an index necessary?How to interpret EXPLAINMeaning of each item in EXPLAINIndex Tuning ProcessCase studies of analysisCase 1: When no index is set at allCase 2: When sort and a Limit clause are added to query "1"Case 3: When there is a range condition in addition to "2"Case 4: When there are multiple range conditionsCase 5: Creating a covering indexCase 6: Covering Index Part 2Case 7: Order of composite indexesCase 8: Indexing large columnsIn conclusion
What is an index?
MySQL's data structure is used to efficiently retrieve data, and its importance varies depending on the size of the database.
An index is like the "index" in a book.
When searching for a specific topic within a book, using the book's index can be helpful.
You can quickly find out which page of the book that item appears on.
Why is an index necessary?
In books with few pages, you can find the item you're looking for by searching from the first page,
In books with many pages, like dictionaries, it's not easy to find what you're looking for quickly.
Similarly, in the case of small databases, data can be found even without indexes.
However, performance may degrade rapidly as the database size increases.
Therefore, it is necessary to set up appropriate indexes so that the desired data can be found quickly.
How to interpret EXPLAIN
To create an index, you need to utilize the EXPLAIN command.
EXPLAIN visualizes how the MySQL query optimizer executes queries.
EXPLAIN should be used in an environment with the same amount of data as the production environment.
In environments with a small number of records, such as development or staging environments, a full scan may be deemed faster even if an index exists.
Let's actually analyze the query.
Meaning of each item in EXPLAIN
- idID number that identifies the select statement
This may be helpful if you are using subqueries or UNIONs.
- select_typeThis section will display information such as PRIMARY/SUBQUERY/DEPENDENT SUBQUERY/UNCACHEABLE SUBQUERY/DERIVED/UNION/UNION RESULT, etc.
DEPENDENT SUBQUERYBe careful if you see a correlated subquery.
- tableTable name
- type
Table access methods include ALL/index/range/ref/eq_ref/const, system/null, etc.ALLThis is a full scan.indexThis is a full scan in index order, so be careful if this appears.
The desired order is const-eq_ref-ref-range-index-ALL
- possible_keysThese are the possible indexes that can be selected.
- keyThis is the index actually selected in the query. If it's NULL, the index is not being used.
- key_lenThis is the length (in bytes) of the key being used.
This shows how much of the composite index is being utilized.
For example, if a composite index is applied to int, datetime, and varchar(4):
If only the int column is being used, key_len is 4.
If both the int and datetime columns are being used, key_len is 12 (=4+8).
If all columns are being used and Japanese characters are being used in utf8, key_len is 24 (=4+8+(4×3)).
See below for the data size of each.
Generally, shorter key lengths are better, but if slow queries are occurring even when indexes are being used, it often means that the intended composite index is not being used.
- refThe `ref` column is used to look up the index value specified in the `key` column. If a constant is specified, it will be `const`. In the case of a JOIN, it will display the column used as the search condition in the joining table.
- rowsThis indicates the number of rows that MySQL estimates it needs to read to find the desired row. Generally, the fewer rows, the better. Note that this number is a rough estimate and not the actual number of rows fetched, and the number of returned result rows may be smaller because the search conditions in the WHERE clause further refine the fetched rows.
- ExtraThis indicates the strategy the optimizer has chosen. The following information will be displayed:
- Using an index - This is the best approach, as it utilizes a covering index.
- Using `where` - The MySQL server filters the rows after they have been retrieved by the storage engine using the `where` clause.
- Using temporary - A temporary table is created for sorting the results. This is sometimes unavoidable when using group by, but it should be avoided as much as possible.
- Using filesort - Use filesort instead of index order. Filesort can be fast for small numbers of items, but it will almost always become a bottleneck when the number of items increases, so it should be avoided.
Index Tuning Process
- To identify slow queries, we'll configure the system to retrieve slow queries and search for them.
- Run slow queries with EXPLAIN attached to a DB slave that does not have a service reference.
- Check the output of rows and extras to see if the appropriate indexes are being applied.
- Based on the results of step 3, we will add indexes to the tables that lacked them.
A summary of tuning tipsLet's create an index that reduces the number of rows.At most, around tens of thousands.I would like to do this.
- Don't blindly add indexes as it will affect update performance.
- Consider adding an index if it narrows the rows to about 20%.
- If the cardinality is too low (even with an index, it only narrows the rows to 30% or more), adding an index may not be effective or may even slow things down.
- Check the key_len column in explain to see how far the composite index is working.
- In the Extra column...Usingfilesort Using TemporaryIf this appears, pay close attention.
- Make sure to use an index for sorting (order by).
- Write queries that utilize indexes effectively.
- Composite indexes terminate when a range condition (e.g., upd_datetime < 'YYYYMMDD') is encountered.
- In other words...Even if there are multiple composite indexes for different range conditions within a composite index, both cannot be used simultaneously.The same applies to sorting.
Case studies of analysis
Let's analyze a certain table design as an example.
Sample table name:i_user_test
Number of records: Approximately 1 million items
Case 1: When no index is set at all
Since there are 1.06 million rows, a full scan is being performed to retrieve all records.
prefecture_idThe number of rows has been narrowed down to approximately 30,000. To further reduce the number of rows...
levelThe number of rows has been narrowed down to approximately 4000. Looking at key_len, it is 8.
Since each column of the integer is 4 bytes, we can see that a composite index applied to two columns is being used.
full_ragainfull_ragainIt appears to be being used, but the number of rows hasn't changed.
full_ragainlevelCase 2: When sort and a Limit clause are added to query "1"
The number of rows has been narrowed down, but "Using filesort" has appeared in the Extra section.
In this query, 4000 results are retrieved, sorted, and then only 10 are selected, so retrieving approximately 4000 results is essentially a waste.
deck_max_total_ratingdeck_max_total_ratingIn this case, the process is completed once 10 items have been retrieved in the sorted order, so it does not attempt to retrieve the remaining 4000 items in the `rows` variable.
Case 3: When there is a range condition in addition to "2"
deck_max_total_ratingupd_datetimedeck_max_total_ratingdeck_max_total_ratingCase 4: When there are multiple range conditions
levellevelThis is because, as with Case 3, the use of the index ends once the range condition is utilized.
prefecture_idprefecture_id,levelYou can only select one of the indexes.
prefecture_id,levelprefecture_id,upd_datetimeIn this case, it seems possible to rewrite the query.
levelThe range condition, which was previously set to level < 5, has been changed to level in (1,2,3,4,5).
With key_len set to 16 and indexes applied to all columns, the number of rows has decreased to 1486.
When range conditions can be rewritten with a relatively small number of arguments using IN, this allows for more effective use of the index.
However, it is necessary to compare actual execution speeds in a production environment.
Case 5: Creating a covering index
grade_idYou can return the result just by looking at the index.Covering Index" is what they say.
The "Extra" column contains the "Using index," indicating that it is a covering index.
Creating too many indexes is also not good, as it can lead to data bloat and increased update load.
Use it when it seems like it might be effective.
Case 6: Covering Index Part 2
prefecture_iddeck_max_total_ratingThe column "Extra" shows "Using index," indicating that it is a covering index.
deck_max_total_ratingdeck_max_total_ratingCase 7: Order of composite indexes
The index is evaluated from left to right.
If the leftmost column of a composite index does not meet the criteria, the index will not be used.
This also applies to composite primary keys.
To use this composite index, you need to rearrange the order as shown below.
Case 8: Indexing large columns
prefecture_battle_idThe key_len is set to 768 = ((255 + 1) × 3). The reason it's tripled is because UTF-8 Unicode requires 3 bytes per character.
In this case, it seems that we can extract the necessary data using only the first character, so let's try creating an index using only that single character.
Since key_len is 6 = ((1+1)×3), we have been able to reduce the index size.
The rows are the same as before, so if you can narrow it down by only the first few characters,
Creating indexes like this is also effective.
In conclusion
SRG is looking for new team members.
If you are interested, please contact us here.
