Article directory

2. MySQL Benchmark Test
appendix

2. MySQL Benchmark Test

2.1 Why benchmarks are needed

Benchmarking is the only convenient and effective way to learn what happens to a system under a given workload.

Benchmarking can observe how the system behaves under different stresses, assess the capacity of the system, understand which changes are important, or observe how the system handles different data.

Benchmarking can create imaginary scenarios for testing outside of the actual load on the system.

2.2 Strategies for Benchmarking

There are two main strategies for benchmarking:

One is the overall test for the entire system, integrated (full-stack)
The second is to test MySQL separately, single-component (single-component)

The main reasons for doing integrated testing for the entire system instead of testing MySQL alone are as follows:

It is very useful to test the entire application system, including Web server, application code, network and database, because users are concerned not only with the performance of MySQL itself, but with the overall performance of the application.
MySQL is not always the bottleneck of the application, and overall testing can reveal this.
Only by testing the application as a whole can the impact of caching between parts be discovered.
Integrated testing of the overall application reveals the true behavior of the application more than testing of individual components can do.

On the other hand, overall benchmarking of an application can be difficult to establish, or even get right. If the benchmark is poorly designed, the results will not reflect the real situation, and decisions based on them may be wrong.

However, sometimes you don't need to know the whole application, but you only need to pay attention to the performance of MySQL, at least in the early stage of the project.

You may choose to test only MySQL based on the following circumstances:

Need to compare the performance of different schemas or queries.
A test for a specific problem in your app.
To avoid lengthy benchmarks, a short-term benchmark can be done in quick "cycle cycles" to detect the effects of certain adjustments.

Also, benchmarking against MySQL is useful if you can perform repeated queries on a real data set, but both the data itself and the size of the data set should be realistic. If possible, take a data snapshot of the production environment.

Unfortunately, setting up a benchmark based on real data is complex and time-consuming. If you can get a copy of the production data set, you are lucky, but this is usually not possible. For example, you want to test a new application that has just been developed, and it has only a few users and data. If you want to test the performance of the application after the scale is expanded to a large scale, you can only do it by simulating a large amount of data and pressure.

2.2.1 Which indicators to test

Sometimes different metrics need to be tested in different ways. For example, different testing methods are required for latency and throughput .

Consider the following metrics to see how you can meet your testing needs.

throughput

Throughput refers to the number of transactions per unit time.

This type of benchmark is mainly aimed at the throughput of online transaction processing (OLTP) , which is very suitable for interactive applications with multiple users.

The commonly used test unit is the number of transactions per second (TPS) , and some also use the number of transactions per minute (TPM) .

response time or delay

This metric is used to test the overall time required for the task.

Depending on the specific application, the time unit of the test may be microseconds, milliseconds, seconds or minutes. According to different time units, the average response time, minimum response time, maximum response time and percentage can be calculated.

The maximum response time is usually of little significance, because the longer the test time, the larger the maximum response time may be. And the results are usually not repeatable, each test may get a different maximum response time.

Therefore, a percentile response time can often be used instead of a maximum response time.

For example, if 95% of the response times are 5 milliseconds, it means that the task can be completed within 5 milliseconds 95% of the time.

concurrency

Concurrency is a very important metric that is often misunderstood and misused.

For example, it is often expressed as how many users browse a Web site at the same time, and the frequently used indicator is how many sessions there are.

However, the HTTP protocol is stateless , and most users simply read the information displayed on the browser, which is not equivalent to the concurrency of the web server. Moreover, the concurrency of the web server is not equal to the concurrency of the database, but only means the ability of the session storage mechanism to handle how much data.

A more accurate measure of the concurrency of a web server should be how many simultaneous concurrent requests there are at any given time.

Corresponding concurrency can be measured in different parts of the application. The high concurrency of the web server will generally lead to high concurrency of the database, but the language and tool set used by the server will have an impact on this. Be careful not to confuse creating a database connection with concurrency.

A well-designed application can have hundreds or thousands of MySQL database server connections open at the same time, but only a few connections may be executing queries at the same time. So, a Web site "has 50,000 simultaneous users" access, but there may only be 10-15 concurrent requests to the MySQL database.

In other words, concurrency benchmarks need to focus on the number of concurrent operations that are working , or the number of threads or connections that are working at the same time. When the concurrency increases, you need to measure whether the throughput drops and the response time becomes longer. If so, the application may not be able to handle the peak pressure.

Concurrency is measured quite differently than response time and throughput. It's less of a result and more of an attribute that sets the benchmark.

Concurrency testing is usually not to test the concurrency that the application can achieve, but to test the performance of the application under different concurrency. Of course, the concurrency of the database still needs to be measured.

You can specify a test of 32, 64 or 128 threads through sysbench, and then record the Threads_ running status value of the MySQL database during the test.

scalability

When the business pressure of the system may change, it is necessary to test the scalability.

Scalability, put simply, means that doubling the work to a system ideally yields twice the results (i.e. doubling the throughput).

In other words, by doubling the resources of the system (such as twice the number of CPUs), you can get twice the throughput.

Of course, at the same time performance (response time) must also be within an acceptable range. Most systems cannot achieve such ideal linear scaling. As the pressure changes, both throughput and performance can get worse and worse.

Scalability metrics are very useful for capacity specification, providing information that other tests cannot provide to help identify application bottlenecks. For example, if the system is designed based on the response time test of a single user (this is a very bad test strategy), although the test results are good, when the concurrency increases, the system performance may become very bad. A response time test based on increasing user connections can reveal this problem.

Some tasks, such as batch jobs that create summary tables from fine-grained data, require periodically fast response times. Of course, the pure response time of these tasks can also be tested, but care should be taken to consider the interaction between these tasks. Batching work can lead to poor performance of queries that affect each other, and vice versa.

Test the metrics that matter most to your users .

2.3 Benchmarking method

Start by avoiding some common mistakes:

Use a subset of the real data rather than the full set. For example, the application needs to process hundreds of GB of data, but the test only has 1GB of data; or only use the current data for testing, but hope to simulate the situation after the future business growth.
Using the wrong data distribution. For example, test with uniformly distributed data, while the real data of the system has many hot spots (randomly generated test data usually cannot simulate the real data distribution).
Use unrealistic distribution parameters, such as assuming that all users' personal information (profile) will be read evenly.
In a multi-user scenario, only do single-user testing.
Test distributed applications on a single server.
Does not match real user behavior. For example, "Think Time" in a Web page. Real users read a page for a while after requesting it, rather than clicking on related links one after another without pausing.
Execute the same query repeatedly. Real queries are not the same, which can lead to lower cache hit ratios. And repeated execution of the same query will cache all or part of the results to some extent.
No errors are checked. If the test results cannot be reasonably explained, such as a query that should be slow suddenly becomes faster, you should check for errors. Otherwise it might just be a test of how quickly MySQL detects syntax errors. After benchmarking, be sure to check the error log, this should be a basic requirement.
The process of system warm up is ignored . For example, test immediately after the system restarts. Sometimes it is necessary to know how long it takes for the system to reach normal performance capacity after restarting, and pay special attention to the warm-up time. Conversely, if you want to analyze normal performance, you need to pay attention. If the benchmark test is started immediately after restarting, the cache is cold and there is no data. It is different when full data.
Use the default server configuration.
The test time is too short. Benchmarks need to last for a certain amount of time.

2.3.1 Design and plan benchmarking

The first step in planning a benchmark is to ask questions and clarify goals. Then decide whether to use standard benchmarks or design ad-hoc tests.

If standard benchmarking is used, it should be confirmed that an appropriate test solution is selected. For example, don't use TPC-H to test e-commerce systems. In the definition of TPC, "TPC-H is a benchmark test for ad hoc query and decision support applications", so it is not suitable for testing OLTP systems.

TPC-H test super detailed introduction

Designing ad-hoc benchmarks is complex and often requires an iterative process. First, you need to get a snapshot of the production dataset, which can be easily restored for subsequent testing.

Then, run a query against the data. A unit test set can be created as a preliminary test and run multiple times. But this is still different from the real database environment.

A better approach is to choose a representative time period, such as an hour during peak hours, or a full day, and log all queries on the production system.

If the time period is selected relatively small, multiple time periods can be selected. This is useful for covering overall system activity, such as queries for weekly reports, or batch jobs that run during off-peak hours.

Queries can be logged at different levels. For example, if it is an integrated (full-stack) benchmark test, you can record HTTP requests on the Web server, or you can turn on MySQL's query log (Query Log) .

If you want to replay these queries, make sure to create multiple threads to execute in parallel, rather than a single thread executing linearly. A separate thread should be created for each connection in the log, rather than randomly assigning all queries to some threads. The query log records in which connection each query was executed.

Even if you don't need to create a dedicated benchmark, it is necessary to write down the test plan in detail. Tests may be run repeatedly many times, so the testing process needs to be reproduced exactly. And it should also take into account that in the future, the next round of testing may not be the same person. Even if it's still the same person, there's a chance you won't remember exactly what it was like when you first ran it. The test plan should record the test data, the steps of system configuration, how to measure and analyze the results, and the warm-up plan, etc.

Specifications should be established for documenting parameters and results, and each round of testing must be documented in detail. A document specification can be as simple as a spreadsheet or notepad, or as complex as a custom database.

Remember that you will often have to write some scripts to analyze the test results, so it is of course better if you don't need to open spreadsheets or text files.

2.3.2 How long the benchmark should run

It is important that the benchmark should be run for a long enough time.

If you need to test the performance of the system in the steady state, then of course you need to test and observe in the steady state. And if the system has a lot of data and memory, it may take a very long time to reach a steady state.

Most systems will have some margin to deal with unexpected situations, can absorb performance spikes, and defer some work to be executed after the peak period. But when the machine is pressurized for long enough, these margins will be used up, and the short-term spikes in the system will not be able to maintain the original high performance.

Sometimes it is not possible to determine how long a test needs to run is sufficient. If so, keep the test running and watch until you're sure the system is stable.

Below is an example of a test performed on a known system. Figure 2-1 shows a timing diagram of the system's disk read and write throughput.

insert image description here
After the system warm-up is complete, the read I/O activity curve tends to be stable after three or four hours, but the write I/O still varies greatly within at least eight hours, and there are some points after that. basically stable.

A simple rule of thumb is to wait for the system to appear stable for at least as long as it takes for the system to warm up. The test in this example lasted 72 hours to ensure that the long-term behavior of the system was represented.

A common wrong way to test is to only perform a series of short-term tests, such as 60 seconds each time, and summarize the performance of the system based on this test.

All the time that has been spent is wasted if there is no time to do an accurate and complete benchmark.

2.3.3 Get system performance and status

When performing a benchmark, it is necessary to gather as much information as possible about the system under test. It's a good idea to have one directory for benchmarking, and for each round of testing to create a separate subdirectory where test results, configuration files, test metrics, scripts, and other relevant instructions are kept. Even if some results are not currently needed, they should be saved first.

It is better to have some extra data than not have important data, and the extra data may be needed later.

The data that needs to be recorded includes system status and performance indicators, such as CPU usage, disk I/O, network traffic statistics, SHOW GLOBAL STATUS counters, etc.

Here is a shell script that collects MySQL test data:

#!/bin/sh

INTERVAL=5
PREFIX=$INTERVAL-sec- status
RUNFILE= /home/benchmarks/running
mysql -e 'SHOW GLOBAL VARIABLES' >> mysql-variables
while test -e $RUNFILE; do
file=$(date +%F_ %I)
sleep=$(date +%S .%N| awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
sleep $sleep
ts="$(date +"TS %s.%N %F %T")"
loadavg="$(uptime)"
echo "$ts $loadavg" >> $PREFIX-${file}-status
mysql -e 'SHOW GLOBAL STATUS' >> $PREFIX-${file}-status &
echo "$ts $loadavg" >> $PREFIX-${ file}-innodbstatus
mysql -e 'SHOW ENGINE INNODB STATUS\G' >> $PREFIX- ${file}- innodbstatus &
echo "$ts $loadavg" >> $PREFIX-${file}-processlist
mysql -e ' SHOW FULL PROCESSLIST\G' >> $PREFIX-${file}-processlist &
echo $ts
done
echo Exiting because $RUNFILE does not exist.

This is just a simple piece of code that does a good job of capturing performance and status data for a test.

As can be seen from the code, only part of the data of MySQL is captured, if necessary, it is easy to add new data capture by modifying the script.

MySQL - view/modify configuration parameters (Global Variables)

2.3.4 Obtaining accurate test results

The best way to get accurate test results is to answer some basic questions about benchmarking:

Did you choose the correct benchmark?
Has relevant data been collected for the problem?
Are the wrong benchmarks being used? For example, are CPU- bound benchmarks being used to evaluate performance for an I/O-bound application ?
Confirm that test results are reproducible. Make sure the state of the system is consistent before each retest.
- If it is a very important test, it may even be necessary to restart the system for each test. In general, what needs to be tested is a pre-warmed system, and you also need to ensure that the warm-up time is long enough (see the previous content on how long the benchmark needs to run) and whether it is repeatable.
- If the warmup uses random queries, the test results may not be repeatable.

If the test process will modify the data or schema, then before each test, you need to use the snapshot to restore the data. Inserting 1000 records into the table and inserting 1 million records will definitely not have the same test results.

Data fragmentation and distribution on disk can cause tests to be non-repeatable. One way to ensure that the distribution of physical disk data is as consistent as possible is to perform a quick format and copy the disk partitions each time.

Be aware that many factors, including external pressure, performance analysis and monitoring systems, detailed logging, periodic jobs, and others, can affect test results.

A typical case is that a cron job is started suddenly during the test, or it is in a patrol read cycle (Patrol Read cycle) , or the RAID card starts a regular consistency check, etc. Make sure that the resources required during the benchmark run are dedicated to the test. If there are other additional operations that consume network bandwidth, or if the test is based on SAN storage shared with other servers, the results are likely to be inaccurate.

In each test, the modified parameters should be as few as possible. If multiple parameters must be changed at once, some information may be lost. Some parameters depend on other parameters, which may not be modifiable independently. Sometimes these dependencies are not even realized, which brings complexity to testing.

In general, the parameters of the benchmark are modified step by step iteratively, rather than making a large number of modifications each time it is run. For example, if you want to tweak parameters to create a specific behavior, you can use divide-and-conquer (halve the parameters each run) to find the correct value.

Many benchmark tests are used to predict the performance of the system after migration, such as migrating from Oracle to MySQL.

This kind of testing is usually cumbersome because MySQL executes a completely different type of query than Oracle.

If you want to know the performance of an application that runs well in Oracle after migrating to MySQL, you usually need to redesign the MySQL schema and query (in some cases, for example, when building a cross-platform application, you may want to know the same query how to run on both platforms, but this is rare).

Also, tests based on MySQL's default configuration don't make much sense , because the default configuration is based on a very small application that consumes very little memory. Sometimes you can see some comparison tests between MySQL and other commercial database products, and the results are very embarrassing, which may be because MySQL uses the default configuration. What makes people speechless is that such obviously wrong test results can easily become headlines.

Finally, if there are abnormal results in the test, don't easily discard them as bad data points . It should be carefully studied and the reason for this result should be found.

The test may yield valuable results, or a critical bug, or a design flaw in the benchmark. If you don't know the test results, don't publish them easily. Some cases show that abnormal test results are often caused by small mistakes, and finally the test fails.

2.3.5 Run the benchmark and analyze the results

In general, automated benchmarking is a good idea. Doing so allows for more precise test results.

Because the automated process can prevent testers from occasionally missing certain steps or misoperations. It also helps to document the entire testing process.

There are many ways to automate, it can be a Makefile or a set of scripts. The scripting language can be selected as required: shell, PHP, Perl, etc. are all available. Automate as much of the testing process as possible, including loading data, warming up the system, executing tests, recording results, etc.

Benchmarks often need to be run multiple times. How many times you need to run it depends on how the results are scored and how important the test is. To improve the accuracy of the test, you need to run it a few more times.

Generally, in the practice of testing, you can take the best result value, or the average value of all the results, or take the average value of the best three values from the five test results.

The test results can be further refined as required. Statistical methods can also be used on the results, determining confidence intervals and the like. Typically, however, this level of deterministic results is not used.

As long as the test results meet current needs, simply run a few rounds of testing and see how the results change. If the results vary widely, a few more runs, or a longer run, can give more deterministic results.

The following is a very simple shell script that demonstrates how to extract time dimension information from the data collected by the previous data collection script. The input parameter of the script is the name of the collected data file.

#!/bin/sh
# This script converts SHOW GLOBAL STATUS into a tabulated format, one line
# per sample in the input, with the metrics divided by the time elapsed
# between samples .
awk '
BEGIN {
	printf "#ts .date time load QPS" ;
	fmt = " %.2f";
}
/^TS/ { # The timestamp lines begin with TS.
	ts = substr($2, 1, index($2, ".") - 1);
	load = NF- 2;
	diff = ts - prev_ts;
	prev_ts = ts;
	printf "\n%s %s %s %s", ts, $3, $4, substr($load, 1, 1ength($1oad)-1);
}
/Queries/ {
	printf fmt,($2 -Queries)/diff;
	Queries=$2
}
' "$@"

insert image description here

2.3.6 Importance of drawing

The output of the previously written scripts can all be customized as data sources for gnuplot or R plots.

Assuming gnuplot is used, assuming the output data file name is QPS-per- 5-seconds:

gnuplot> plot "QPS-per-5-seconds" using 5 W lines title "QPS"

The gnuplot command plots the qps data in the fifth column of the file as a graph, and the title of the graph is QPS. Figure 2-2 is the resulting graph drawn.

insert image description here

2.4 Benchmarking tools

2.4.1 Integrated Test Tool

ab
http_load
JMeter

2.4.2 Single-component test tool

mysqlslap
MySQL Benchmark Suite
SuperSmack
Database Test Suite
Percona’s TPCC-MySQL Tool
sysbench

2.4.3 MySQL built-in function BENCHMARK()

MySQL built-in functions for testing specific operations or expressions.

benchmark(次数，操作)

For example, compare the execution of MD5 and SHA1, who is faster
insert image description here

2.5 Benchmark Test Cases

2.5.1 http_load

Software based on Linux platform.
installation address

wget http://www.acme.com/software/http_load/http_load-12mar2006.tar.gz

Unzip the file when done

tar xzvfhttp_load-12mar2006.tar.gz

Enter file compilation (requires C environment)

make install

Write a file, which contains the URL to be tested

The URL given in the book is as follows

http://www.mysqlperformanceblog.com/
http://www.mysqlperformanceblog.com/page/2/
http://www.mysqlperformanceblog.com/mysql-patches/
http://www.mysqlperformanceblog.com/mysql-performance-presentations/
http://www.mysqlperformanceblog.com/2006/09/06/slow-query-log-analyzes-tools/

Then you can execute the following command:

./http_ load -parallel 并发数目 -seconds 时长 文件名
./http_ load -rate 频率 -seconds 时长 文件名

-parallel : The number of concurrent processes.
-fetches: total number of visits
-rate: visit frequency per second
-seconds: total visit time

After we run, we can get the following results
insert image description here

insert image description here

2.5.2 MySQL BenchMark Suit

MySQL Benchmark Suite (MySQL Benchmark Suite) consists of a set of benchmark tools developed based on Perl.

The following method is to run all tests

$ cd /usr/share/mysql/sql-bench/
sql-bench$ ./run-all-tests --server=mysql --user=root --1og --fast

It takes a long time to run all the tests, and it may take more than an hour. The specific length depends on the hardware environment and configuration of the test. If the --log command line is specified, the progress of the test can be monitored.

The test results are stored in the output subdirectory, and each test result file contains a series of operation timing information.

The author here recommends using WorkBench to learn, because it is visualized and easy to use

sudo snap install mysql-workbench-community

As shown in the figure below, we can see that there is an icon after installation, which can display the official document
insert image description here
of the related relationship diagram

2.5.3 Sysbench

Sysbench can perform many types of benchmark tests, and it is not only designed to test the performance of the database, but also the performance of the server running the database.

sudo apt install sysbench

In fact, Peter and Vadim originally designed this tool to perform MySQL performance tests (although not all MySQL benchmarks can be done).

It is strongly recommended that everyone become familiar with sysbench testing, which should be one of the most useful tools in a MySQL user's toolkit. Although there are many other testing tools that can replace some functions of sysbench, those tools are sometimes not reliable, and the results obtained are not necessarily related to MySQL performance.

For example, a series of tools such as iozone and bonnie++ can be used for I/O performance testing, but attention needs to be paid to designing scenarios so that the disk I/O mode of InnoDB can be simulated. The I/O test of sysbench is very similar to the I/O mode of InnoDB, so the fileio option is very useful.

CPU benchmark

sysbench --test=cpu --cpu-max-prime=20000 run

This test uses 64-bit integers and tests the time required to calculate a prime number up to a certain maximum value.

insert image description here

Sysbench's file I/O benchmark

The file I/O (fileio) benchmark can test the performance of the system under different I/O loads. This is helpful for comparing different hard drives, different RAID cards, and different RAID modes. The I/O subsystem can be tuned based on the test results. The file I/O benchmark simulates many of InnoDB's I/O characteristics.

The first step of the test is the prepare stage, which generates the data files used in the test, and the generated data files must be at least larger than the memory. If the data in the file can be completely stored in the memory, the operating system caches most of the data, so the test results cannot reflect the I/O-intensive workload. First create a data set with the following command (150G is used in the book, it feels a bit too big):

sysbench --test=fileio --file-total-size=150G prepare

This command will create test files in the current working directory, and the subsequent run phase will be tested by reading and writing these files.

The second step is the run phase, which has different test options for different I/O types:

seqwr	顺序写入
seqrewr 顺序重写
seqrd 顺序读取
rndrd 随机读取
rndwr 随机写入
rdnrw 混合随机读/写

The following commands can be combined

sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw/
--init-rng=on --max-time=300 - -max-requests=0 run

The results in the book are given here.
insert image description here
After the test is completed, run the cleanup operation to delete the test file generated in the first step:

sysbench --test=fileio --file-total-size=150G cleanup

sysbench's OLTP benchmark

The OLTP benchmark simulates the workload of a simple transaction processing system .

The following example uses a table with more than one million rows. The first step is to generate this table:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test/
--mysql-user=root prepare

Generating test data only requires the above simple command. Next, you can run the test. This example uses 8 concurrent threads, read-only mode, and the test duration is 60 seconds:

sysbench --test=oltp --oltp-table-size=1000000 --mysql-db=test --mysql-user=root/
--max-time=60 --oltp-read-only=on --max-requests=0 --num-threads=8 run

In addition, sysbench has other tests, for details, please read the official github .

appendix

"High Performance MySQL"
by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, written by
Ning Haiyuan, Zhou Zhenxing, Peng Lixun, Zhai Weixiang, and Liu Hui

"High Performance MySQL" - MySQL Benchmark Test (Notes)

Article directory

2. MySQL Benchmark Test

2.1 Why benchmarks are needed

2.2 Strategies for Benchmarking

2.2.1 Which indicators to test

2.3 Benchmarking method

2.3.1 Design and plan benchmarking

2.3.2 How long the benchmark should run

2.3.3 Get system performance and status

2.3.4 Obtaining accurate test results

2.3.5 Run the benchmark and analyze the results

2.3.6 Importance of drawing

2.4 Benchmarking tools

2.4.1 Integrated Test Tool

2.4.2 Single-component test tool

2.4.3 MySQL built-in function BENCHMARK()

2.5 Benchmark Test Cases

2.5.1 http_load

2.5.2 MySQL BenchMark Suit

2.5.3 Sysbench

appendix

Guess you like