MySQL Index Tuning Basics
This is Oniumi (@fat47) from the Service Reliability Group (SRG) of the Technology Headquarters.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article is located in SRGDBWGWe will publish materials regarding the database that the (DB Working Group) provides to the entire company.
I hope this helps in some way.
What is an index?Why do we need indexes?How to read EXPLAINMeaning of each item in EXPLAINIndex Tuning FlowAnalysis Case StudiesCase 1: No index is setCase 2: Adding a sort and limit clause to the query in "1"Case 3: When there is a range condition in addition to "2"Case 4: When there are multiple range conditionsCase 5: Creating a covering indexCase 6: Covering Index No. 2Case 7: Composite index orderCase 8: Large column indexConclusion
What is an index?
This is a data structure that allows MySQL to retrieve data efficiently, and its importance varies depending on the data size of the DB.
An index is like an index in a book.
By using the book index when searching the contents of a book for a certain topic,
You can quickly find out what page of the book the item appears on.
Why do we need indexes?
In books with few pages, you can find the item you are looking for by searching from the first page,
It is difficult to find what you are looking for quickly in books with many pages, such as dictionaries.
Similarly, in a DB, if the database is small, data can be searched even without an index.
However, as the DB size increases, performance may decrease dramatically.
Therefore, it is necessary to set up appropriate indexes so that the desired data can be found quickly.
How to read EXPLAIN
To create an index, you need to use the EXPLAIN command.
EXPLAIN gives you visibility into how the MySQL query optimizer executes your query.
EXPLAIN should be run in an environment that contains the same amount of data as the production environment. In development and staging environments with a small number of records, it may be determined that a full scan is faster even if an index exists.
Let's actually analyze the query.
Meaning of each item in EXPLAIN
- id The ID number identifying the select statement
This may be useful if you are using subqueries or UNIONs.
- select_type PRIMARY/SUBQUERY/DEPENDENT SUBQUERY/UNCACHEABLE SUBQUERY/DERIVED/UNION/UNION RESULT, etc. will be listed.
DEPENDENT SUBQUERYBe careful if you see (correlated subquery).
- table Table name
- type
Table access methods
ALL/index/range/ref/eq_ref/const,system/null, etc.
ALLis a full scan,indexis a full scan in index order, so be careful if you see this.
The preferred order is const-eq_ref-ref-range-index-ALL
- possible_keys The indexes that may be selected.
- key The index actually selected by the query. If NULL, no index is used.
- key_len The length (in bytes) of the key used.
This tells you how much of the composite index is being used.
For example, if a composite index is applied to int, datetime, and varchar(4),
if only the int column is used for the index, key_len is 4.
If the int column and datetime column are used, key_len is 12 (=4+8).
If all columns are used and Japanese is used in utf8, key_len is 24 (=4+8+(4×3)).
See below for the data capacity of each.
Generally speaking, it is better to have a shorter key length, but if slow queries are occurring even though indexes are being used, it is often the case that the targeted composite index is not being used.
- ref The ref column is used to check the value of the index specified in the key column. If a constant is specified, const is displayed, and in the case of JOIN, the column used as a search condition in the other table to be joined is displayed.
- rows This shows the number of rows that MySQL estimates it needs to read to find the desired row. In general, the fewer rows, the better. Note that this number is a rough estimate and does not represent the actual rows fetched, and that the search conditions in the WHERE clause may further filter the fetched rows, so the number of result rows returned may be fewer.
- Extra This shows what strategy the optimizer chose. The following information is displayed:
- Using index - Using a covering index is the best.
- Using where - The MySQL server filters the rows using the where clause after they are retrieved by the storage engine.
- Using temporary - A temporary table is created to sort the results. This can be unavoidable when using group by, but it is best to avoid this whenever possible.
- Using filesort - Use file sorting instead of index order. If the number of items is small, file sorting may be faster, but if the number of items increases, this part will become the bottleneck, so avoid it.
Index Tuning Flow
- To identify slow queries, set up slow query capture and search for queries.
- Run the slow query with EXPLAIN on a DB slave that does not have a service reference.
- Check the rows and Extra output to see if the appropriate indexes have been applied.
- Set indexes on any tables that were missing indexes as a result of step 3.
Tuning tips summary
- Create an index that results in fewer rows.At most, several tens of thousands.
- Don't apply indexes indiscriminately as they will affect update performance.
- Consider applying an index if it will narrow down the rows to around 20%.
- If the cardinality is too low (applying an index will only narrow down the rows to more than 30%) applying an index may be ineffective or may even slow down the database.
- Look at the key_len column in the explain command to see how effective the composite index is.
- In the Extra column,Usingfilesort Using Temporarybe careful if this is displayed.
- Be sure to use an index for sorting (order by).
- Write queries that make use of indexes.
- A composite index will end when a range condition (upd_datetime < 'YYYYMMDD', etc.) occurs.
- In other words,If there are multiple range conditions in a composite index, both cannot be used.The same is true for sorting.
Analysis Case Studies
Let's analyze a certain table design as a sample.
Sample Table Name:i_user_test
Number of records: Approximately 1 million items
Case 1: No index is set
Since there are 1.06 million rows, a full scan is performed to retrieve all rows.
prefecture_id
The number of rows has been narrowed down to about 30,000.
level
The number of rows has been narrowed down to about 4000. If you look at key_len, it is 8.
Since an int is 4 bytes per column, we can see that a composite index on two columns is being used.
full_ragain
full_ragain
It appears to be in use, but the number of rows has not changed.
full_ragain
level
Case 2: Adding a sort and limit clause to the query in "1"
The number of rows has been narrowed down, but Using filesort appeared in Extra.
In this query, 4,000 items are retrieved, sorted, and then only 10 items are extracted, so the retrieval of approximately 4,000 items is almost a waste.
deck_max_total_rating
deck_max_total_rating
In this case, the processing is completed when 10 items have been retrieved in sorted order, so the 4,000 rows will not need to be retrieved.
Case 3: When there is a range condition in addition to "2"
deck_max_total_rating
upd_datetime
deck_max_total_rating
deck_max_total_rating
Case 4: When there are multiple range conditions
level
level
This is similar to case 3 because the index usage stops when the range condition is used.
prefecture_id
prefecture_id,level
You can only select one of the indexes.
prefecture_id,level
prefecture_id,upd_datetime
It looks like we can rewrite the query in this case:
level
The range condition that was level <5 has been changed to level in (1,2,3,4,5).
The key_len is 16, all columns are indexed, and the number of rows has been reduced to 1486.
In this way, if the range condition can be rewritten using IN with fewer arguments, the index can be used more effectively.
However, it is necessary to compare the actual execution speed in a production environment.
Case 5: Creating a covering index
grade_id
You can return results just by looking at the index.Covering Index"It is called.
Since "Using index" is listed in the Extra column, it is a covering index.
Having too many indexes is not good either, as it can lead to data bloat and update load.
Use it when it seems effective.
Case 6: Covering Index No. 2
prefecture_id
deck_max_total_rating
Since "Using index" appears in the Extra column, it is a covering index.
deck_max_total_rating
deck_max_total_rating
Case 7: Composite index order
The indexes are evaluated from left to right.
If the leftmost column of a composite index is not included in the conditions, the index will not be used.
This also applies to composite primary keys.
To use this composite index, you can rearrange the order as shown below:
Case 8: Large column index
prefecture_battle_id
key_len is 768 = ((255 + 1) × 3). It is tripled because utf8 Unicode requires 3 bytes per character.
In this case, it seems that we can extract the necessary data with just the first character, so we will try to add an index of just one character.
Since key_len is 6 = ((1 + 1) × 3), the index size can be reduced.
The rows are the same as before, so if you want to narrow down the results by just the first few characters,
Indexing like this is also useful.
Conclusion
SRG is looking for people to work with us. If you are interested, please contact us here.