We’ve recently added the ability to monitor MySQL query performance statistics from MySQL’s
PERFORMANCE_SCHEMA, and there were a number of lessons learned. There are definitely right and wrong ways to do it. If you are looking to the P_S tables for monitoring MySQL query performance, this blog post might save you some time and mistakes.
What Is The Performance Schema?
First, a quick introduction. The Performance Schema includes a set of tables that give information on how statements are performing. Most of the P_S tables follow a set of predictable conventions: there’s a set of tables with a limited set of full-granularity current and/or historical data, which is aggregated into tables that accumulate over time. In the case of statements, there’s a table of current statements, which feeds into a statement history, that accumulates into statement summary statistics. The tables are named as follows:
| events_statements_current |
| events_statements_history |
| events_statements_history_long |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest |
| events_statements_summary_by_host_by_event_name |
| events_statements_summary_by_thread_by_event_name |
| events_statements_summary_by_user_by_event_name |
| events_statements_summary_global_by_event_name |
The tables most people will care about are
events_statements_current, which is essentially a replacement for
SHOW FULL PROCESSLIST, and
events_statements_summary_by_digest, which is statistics about classes of queries over time. The rest of the tables are pretty much what they look like – summaries by user, etc.
These tables were introduced in MySQL 5.6, and not all of them will be enabled by default. There’s a performance overhead to enable them, but this should be small relative to the performance improvements you can gain from using them.
We prefer our technique of decoding network traffic server-side to measure query performance, for several reasons, but the statement digest table is the next-best thing in cases such as Amazon RDS where that’s not possible. It gives us enough data to present a view of Top Queries as shown below.
Now let’s dig into specifics about these tables and how to use them.
Monitoring MySQL Performance - An Overview
The general idea, for most MySQL performance monitoring tools, is to read from
events_statements_summary_by_digest at intervals and subtract each sample from the next, to get rates over time. As you can see in the below sample, there are a lot of columns with various statistics about each family of queries in the table:
mysql> select * from events_statements_summary_by_digest limit 1\G
*************************** 1. row ***************************
DIGEST_TEXT: SELECT @ @ max_allowed_packet
FIRST_SEEN: 2014-09-12 16:04:38
LAST_SEEN: 2014-10-31 08:26:07
These columns are mostly counters that accumulate over time. The
COUNT_STARcolumn, for example, shows the number of times the statement has been executed. The
SUM_ columns are just what they look like.
Enabling And Sizing The Table
The table needs to be enabled as usual with the Performance Schema by using its setup table,
setup_consumers. That table contains a row for each P_S consumer to be enabled. Other setup tables and server variables control some of the configuration as well, though the defaults work OK out of the box most of the time.
The table can also be sized, in number of rows. By default it is 10,000 rows (although I think somewhere I saw a documentation page that said 200).
Limitations Of The Table
There are a couple of limitations you should be aware of.
- The statement digest table does not record anything about statements that are prepared. It only captures statements that are executed by sending the full SQL to the server as text. If you use prepared statements, the table probably does not capture your server’s performance accurately. The drivers for many programming languages use prepared statements by default, so this could be a real issue. (If this is a problem for you, you might like to know that VividCortex captures prepared statements from network traffic, including samples).
- The table is fixed-size, and resizing it requires a server restart.
- Some things aren’t captured in full granularity. For example, when we’re capturing MySQL query performance data from network traffic, we can measure specific error codes. There’s a
SUM_ERRORS column in the table, but you can’t see what the error codes and messages were.
Resetting The Table (Or Not)
The table can be reset with a
TRUNCATE to start afresh, but generally shouldn’t be. Why would you want to do this? There might be a few reasons.
First, the table is fixed-size, and if the table isn’t large enough to hold all of the distinct types of queries your server runs, you’ll get a catch-all row with a NULL digest and schema. This represents statements that aren’t being tracked separately, and might be important for some reason. A
TRUNCATE will empty the table if this is the case.
Second, statistics accumulate over time, so columns such as first-seen and last-seen dates may eventually end up being useless to you. The min, max, and average timer waits will not be very helpful over long periods of time, either.
Finally, you might want to reduce the number of rows it contains, so that occasional queries that are never purged don’t introduce performance overhead when reading the table.
There are tools that do this completely wrong, though. Some of them empty out the table every time they read from it. This is the worst behavior because these tables are not session-specific. They are global, and a
TRUNCATE will affect everyone who’s looking at them. It might be kind of rude to constantly throw away the data your colleague (or another tool) is looking at.
The other problem with this is that a tool that reads from the table, then truncates it, is subject to race conditions. Statements that complete between these actions will be discarded and never seen. Of course, there’s no way to avoid this, except by just not doing it, or not doing it often.
I would suggest resetting this table only manually and only when needed, or perhaps at infrequent intervals such as once a day or once an hour, from a scheduled task.
Accumulating Statements Correctly
The table’s primary key isn’t defined in the schema, but there’s a unique set of columns. This is not, contrary to what I’ve seen some software assume, the
DIGESTcolumn. There is one row per digest, per schema. The combination of schema and digest is unique.
This means that if you’re looking for all information about a single class of queries regardless of schema, you need to aggregate together all of the rows with the same
One of the implications of the uniqueness being defined by schema and digest together is that servers that have a large number of schemas and a large number of digests will need a really huge number of rows to keep track of all of the statements. At VividCortex, we have customers whose servers have literally millions or tens of millions of distinct families of queries running on a regular basis. Multiply this by a large number of schemas, and you have no hope of keeping track of them with the P_S tables. This is not a problem for our default collection mechanism, though: by default we capture MySQL query performance statistics by decoding the server’s network traffic. This handles high-cardinality scenarios without trouble.
Don’t Run GROUP BY On The Table Frequently
There are several ways you can cause performance impact to the server by reading from the P_S tables.
One is by using complex queries on these tables. They are in-memory, but they’re not indexed. For example, if you run a
GROUP BY to aggregate rows together by digest, you’ll cause trouble. You probably shouldn’t do this, at least not frequently.
Recall that VividCortex measures everything at 1-second resolution, giving you highly detailed performance statistics about your entire system. The statement statistics are no different; we have per-second statement (query) statistics. Reading from the P_S table once per second with a
GROUP BY clause has too much performance impact on the server. It is less costly to read the entire table and accumulate the statistics in application code, in our tests.
Don’t Re-Fetch Data
Another way to cause problems is to fetch the
DIGEST_TEXT column with every query. This column isn’t enormous, because it’s limited to 1kb in length, but it’s still large enough that you should not repeatedly fetch it. Instead, when you see an unknown digest, you should query for it only then. This may introduce a lot of complexity into your application code, but this is what needs to be done.
Handle Results Smartly
We care a lot about performance at VividCortex, obviously. We’ve built our approach to performance management with minimal overhead in mind. This is why we don’t do things like enable your server’s slow query log or poll commands that can block the server. But in addition to avoiding overhead on the server, we have to make our agent’s performance good, too.
This is why we also do things you might consider to be extreme. For example, MySQL returns all of the numbers from the Performance Schema in textual format over the network, and these then have to be converted into numbers by the client. Even ascii-to-number conversion has a cost we can measure, so we don’t do it unless the
COUNT_STAR column has changed since the last time we saw a row. This makes a material difference in the agent’s CPU consumption.
Capturing Sample Statements
Capturing samples of queries is very helpful. Aggregate statistics about groups of queries aren’t revealing enough; you need to be able to look at specific instances of queries to EXPLAIN them and so on.
To get samples, you’ll need to look at the current or historical statement tables. Not all of these are enabled by default, though.
LEFT JOIN against the statement history tables to get samples of individual query executions. This obviously should not be done at 1-second frequency. Even if it’s done infrequently, it’s not really a great idea. When you have fine-detailed performance instrumentation you can really see small spikes in server performance, and an occasionally intrusive query can potentially starve high-frequency fast-running queries of resources.
VividCortex’s approach to this, by the way, is to collect samples probabilistically, which is different from the usual practice of trying to find a “worst” sample. Worst-sample is okay in some ways, but it is not representative, so it doesn’t help you find out much about a broad spectrum of queries and their execution. Here’s a screenshot of what our sampling approach yields, which is quite different from the picture you’ll get from worst-sample tactics:
Although we prefer to be able to capture and decode network traffic to see the full detail about what’s happening inside the server, in cases where that’s not possible, the Performance Schema in MySQL 5.6 and greater is a good alternative. There are just a few things one should take care to do, at least at 1-second resolution as we do at VividCortex. And there are a few common mistakes you can stumble over that will either be bad behavior or might make your results just plain wrong.
If you have suggestions, comments, or questions, please leave them below!