Recording slow queries and analyzing them using pt-query-digest
This is Oniumi (@fat47) from the Service Reliability Group (SRG) of the Technology Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article is located in SRGDBWGWe will publish materials regarding the database provided by the DB Working Group for the entire company.
I hope this helps in some way.
What is a slow query?Recommended settings for the slow query logEnable log outputRotation settings with logrotateHow to analyze the slow query logGet statistics on slow queriesUse the pt-query-digest options to investigateAnalyzing queries with EXPLAINConclusion
What is a slow query?
Anything that exceeds the long_query_time value specified in the MySQL configuration will be logged.
Generally, slow queries are caused by poorly indexed or large number of rows.
These queries are often expensive and put a strain on the database.
Recommended settings for the slow query log
This document describes the configuration procedure when running MySQL on-premise or on EC2.
Enable log output
Set in my.cnf
Or log in to MySQL and run the following:
Rotation settings with logrotate
With this setting, slow queries will continue to be written to the same mysql-slow.log file.
Since this makes analysis difficult, we will rotate it by date.
Running the tests
Once the settings are complete, the logs will be accumulated as shown below.
How to analyze the slow query log
If you check the contents of the recorded slow query log, you will see that queries that exceed the threshold are recorded in this manner.
In actual logs, there can be hundreds or even thousands of these items, so it is impossible to check them all.
Get statistics on slow queries
Let's use pt-query-digest from the Percona Toolkit to get statistics on the slow logs.
Installation example on CentOS7
Try running it without any options
Let's check the output
At the top, log statistics such as Exec Time and Lock Time are summarized.
The next section outputs a ranking of services with slow response times and high call volumes.
- Response time shows the total execution time of that query and the percentage of the total slow log that it takes up
- Calls is the number of times the query was executed
- R/Call Response Time / Call is the average response time per execution.
- V/M is the ratio of the variance to the mean of the response time (?)
- Item displays a summary of the query
The next section records aggregate information for each query.
- Query_time distribution A graph of average query execution times is displayed.
- Tables This is a query that allows you to check the table information related to this query.
- EXPLAIN Here is a sample query that can be checked with EXPLAIN:
In this log example, we can see that Query1 is an extremely slow query, with most response times ranging from 100ms to 10 seconds.
Use the pt-query-digest options to investigate
As mentioned earlier, analysis can be performed with the default settings,
Utilizing the following options will make analysis more convenient.
We will introduce some excerpts below.
I want to search logs by specifying a date or time range.
I want to sort by total query execution time
I want to sort by the number of lines read.
I want to display the tables with the most slow queries in order
Analyzing queries with EXPLAIN
After analyzing the slow query log and identifying queries that can be improved,
Now it's time to analyze the query itself.
We plan to write a separate article about this, so please wait a while.
Conclusion
SRG is looking for people to work with us.
If you're interested, please contact us here.