MySQL Index Tuning Basics
This is Oniumi (@fat47) from the Service Reliability Group (SRG) of the Technology Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article is located in SRGDBWGWe will publish materials regarding the database provided by the DB Working Group for the entire company.
I hope this helps in some way.
What is an index?Why do we need indexes?How to read EXPLAINMeaning of each item in EXPLAINIndex tuning flowAnalysis Case StudiesCase 1: No indexes are setCase 2: Adding a sort and limit clause to the query in "1"Case 3: When there is a range condition in addition to "2"Case 4: Multiple range conditionsCase 5: Creating a covering indexCase 6: Covering Index No. 2Case 7: Composite index orderCase 8: Index on a large columnConclusion
What is an index?
This is a data structure that allows MySQL to retrieve data efficiently, and its importance varies depending on the data size of the database.
An index is like the index of a book.
By using the book's index when searching for a particular topic,
You can quickly find out which page of the book the item appears on.
Why do we need indexes?
In books with few pages, you can find the item you are looking for by searching from the first page,
It is difficult to find what you are looking for quickly in a book with many pages, such as a dictionary.
Similarly, in the case of a small database, data can be searched even without an index.
However, as the DB size increases, performance may decrease dramatically.
Therefore, it is necessary to set up appropriate indexes so that the desired data can be found quickly.
How to read EXPLAIN
To create an index, you need to use the EXPLAIN command.
EXPLAIN gives you visibility into how the MySQL query optimizer executes your queries.
EXPLAIN should be run in an environment with the same amount of data as your production environment.
In environments with a small number of records, such as development or staging environments,
it may be determined that a full scan is faster even if an index exists.
Let's actually analyze the query.
Meaning of each item in EXPLAIN
- id ID number identifying the select statement
This may be useful if you are using subqueries or UNIONs.
- select_type PRIMARY/SUBQUERY/DEPENDENT SUBQUERY/UNCACHEABLE SUBQUERY/DERIVED/UNION/UNION RESULT, etc. are listed.
DEPENDENT SUBQUERYBe careful if you see (correlated subquery).
- table Table name
- type
Table access methods
All/index/range/ref/eq_ref/const, system/null, etc.ALLis a full scan,indexis a full scan in index order, so be careful if this appears.
The preferred order is const-eq_ref-ref-range-index-ALL
- possible_keys The indexes that may be selected.
- key The index actually selected by the query. If NULL, no index is used.
- key_len The length (in bytes) of the key used.
This shows how much of the composite index is being used.
For example, if a composite index is created using int, datetime, and varchar(4),
if only the int column is used, key_len is 4.
If the int and datetime columns are used, key_len is 12 (= 4 + 8).
If all columns are used and Japanese is used in UTF-8, key_len is 24 (= 4 + 8 + (4 × 3)).
For details on the data capacity of each column, see below.
Generally speaking, it is better to have a shorter key length, but if slow queries are occurring even when indexes are being used, it is often because the targeted composite index is not being used.
- ref The ref column is used to look up the value of the index specified in the key column. If a constant is specified, const is displayed. In the case of a JOIN, the column used as a search condition in the other table is displayed.
- rows This shows the number of rows that MySQL estimates it needs to read to find the desired row. In general, the fewer rows, the better. Note that this number is a rough estimate and does not represent the actual rows fetched. Note that the search conditions in the WHERE clause may further narrow down the fetched rows, so the number of result rows returned may be fewer.
- Extra This shows the strategy selected by the optimizer. The following information is displayed:
- Using index - Using a covering index is the best.
- Using where - The MySQL server filters the rows using the where clause after retrieving them from the storage engine.
- Using temporary - A temporary table is created to sort the results. This can be unavoidable when using group by, but it is best to avoid this as much as possible.
- Using filesort - Use file sort instead of index order. File sorting can be fast if the number of records is small, but as the number of records increases, this part becomes a bottleneck, so it should be avoided.
Index tuning flow
- To identify slow queries, set up slow query capture and search for queries.
- Execute the slow query with EXPLAIN on the DB slave that does not have a service reference.
- Check the rows and Extra output to see if the appropriate indexes are applied.
- Set indexes on tables that were missing indexes as a result of step 3.
Tuning tips summary
- Create an index that reduces rows.At most, several tens of thousandsI would like to do this.
- Don't add indexes indiscriminately as they will affect update performance.
- Consider adding an index if it will narrow down the rows to around 20%.
- If the cardinality is too low (adding an index will only narrow it down to more than 30%), adding an index may be ineffective or may even slow down the database.
- Look at the key_len column in the explain command to see how effective the composite index is.
- In the Extra columnUsingfilesort Using TemporaryIf this appears, be careful.
- Make sure to use an index for sorting (order by).
- Write queries that make use of indexes.
- A composite index will end when a range condition (such as upd_datetime < 'YYYYMMDD') occurs.
- In other words,Even if there are multiple range conditions in a composite index, both cannot be used.The same is true for sorting.
Analysis Case Studies
Let's analyze a certain table design as an example.
Sample Table Name:i_user_test
Number of records: Approximately 1 million items
Case 1: No indexes are set
There are 1.06 million rows, so a full scan is performed to retrieve all of them.
prefecture_id
The number of rows has been narrowed down to about 30,000.
level
The number of rows has been narrowed down to about 4000. Looking at key_len, it is 8.
Since int is 4 bytes per column, we can see that a composite index on two columns is being used.
full_ragain
full_ragain
It appears to be in use, but the number of rows has not changed.
full_ragain
level
Case 2: Adding a sort and limit clause to the query in "1"
The number of rows has been narrowed down, but Using filesort appears in Extra.
In this query, 4,000 items are retrieved, sorted, and then only 10 items are extracted, so retrieving approximately 4,000 items can be said to be almost a waste.
deck_max_total_rating
deck_max_total_rating
In this case, the processing is completed when 10 items have been retrieved in sorted order, so the 4,000 rows will not be retrieved.
Case 3: When there is a range condition in addition to "2"
deck_max_total_rating
upd_datetime
deck_max_total_rating
deck_max_total_rating
Case 4: Multiple range conditions
level
level
This is because, as in Case 3, the index usage ends when the range condition is used.
prefecture_id
prefecture_id,level
You can only select one of the indexes.
prefecture_id,level
prefecture_id,upd_datetime
In this case, it looks like we can rewrite the query.
level
The range condition, which was level < 5, has been changed to level in (1,2,3,4,5).
The key_len is 16, all columns are indexed, and the number of rows has been reduced to 1486.
In this way, if the range condition can be rewritten using IN with a relatively small number of arguments, the index can be used more effectively.
However, the actual execution speed needs to be compared in a production environment.
Case 5: Creating a covering index
grade_id
You can return results by simply looking at the index.Covering Index"It is called.
Since "Using index" is listed in the Extra column, it is a covering index.
It is not good to have too many indexes as this will lead to data bloat and update load.
Use it when it seems effective.
Case 6: Covering Index No. 2
prefecture_id
deck_max_total_rating
Since "Using index" appears in the Extra column, it is a covering index.
deck_max_total_rating
deck_max_total_rating
Case 7: Composite index order
The indexes are evaluated from left to right
If the leftmost column of a composite index is not included in the conditions, the index will not be used.
This also applies to composite primary keys.
To use this composite index, rearrange the order as shown below.
Case 8: Index on a large column
prefecture_battle_id
key_len is 768 = ((255 + 1) × 3). The reason it is three times larger is because utf8 Unicode requires three bytes per character.
In this case, it seems that we can extract the necessary data with just the first character, so we will try attaching an index with just one character.
Since key_len is 6 = ((1 + 1) × 3), the index size can be reduced.
The rows are the same as before, so if you want to narrow down the search by just the first few characters,
Indexing like this is also useful.
Conclusion
SRG is looking for people to work with us.
If you're interested, please contact us here.