Recording slow queries and analyzing them using pt-query-digest

This is Oniumi (@fat47) from the Service Reliability Group (SRG) of the Technology Headquarters.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
This article is located in SRGDBWGWe will publish materials regarding the database that the (DB Working Group) provides to the entire company.
I hope this helps in some way.
 

What is a slow query?


Anything that exceeds the long_query_time value specified in the MySQL settings will be logged.
Generally, slow queries are caused by poorly indexed or queries that call up a large number of rows.
These queries are often expensive and put a strain on the database.
 

Recommended settings for the slow query log


💡
This document describes the configuration procedure when you are running MySQL on-premises or on EC2.
 

Enable log output

Set in my.cnf
 
Or log in to MySQL and run the following:
 

Rotation setting with logrotate

With this setting, slow queries will continue to be written to the same mysql-slow.log file.
Since this will make analysis difficult, we will rotate the data by date.
 
Running the tests
 
Once the settings are complete, the logs will be accumulated as shown below.
 

How to analyze the slow query log


If you check the contents of the recorded slow query log, you will see that queries that exceed the threshold are recorded in the following format.
In actual logs, there can be hundreds or even thousands of these items, so it is impossible to check them all.
 

Get statistics on slow queries

Let's use pt-query-digest from the Percona Toolkit to gather statistics on slow logs.
 
Installation example for CentOS7
 
Try running it without any options:
 
Let's check the output
At the top, log statistics such as Exec Time and Lock Time are summarized.
The next section shows a ranking of the slowest response times and the highest number of calls.
  • Response time This shows the total execution time of that query and the percentage of the total slow log that it takes up.
  • Calls is the number of times the query was executed
  • R/Call Response Time / Call is the average response time per execution
  • V/M is the ratio of the response time variance to the mean (?)
  • Item displays a summary of the query
 
The next section records aggregate information for each query.
  • Query_time distribution A graph of average query execution time is displayed.
  • Tables This is a query that allows you to check the table information related to this query.
  • EXPLAIN Here is a sample of a query that can be checked with EXPLAIN:
In this log example, we can see that Query1 is an extremely slow query, with most response times ranging from 100 ms to 10 seconds.
 

Use the pt-query-digest options to investigate

As mentioned before, you can perform analysis with the default settings,
The following options can be used to make analysis more convenient.
We will introduce some excerpts.
 
I want to search the logs by specifying a date or time range.
 
I want to sort by total query execution time
 
I want to sort by the number of lines read
 
I want to display tables with the most slow queries
 

Analyzing Queries with EXPLAIN


After analyzing the slow query log and identifying queries that can be improved,
Now it's time to analyze the query itself.
We plan to create a separate article about this, so please wait a while.
 

Conclusion


SRG is looking for people to work with us. If you are interested, please contact us here.