SpringBoot performance optimization

1. Service monitoring

Before starting to optimize the performance of the SpringBoot service, we need to do some preparations to expose some data of the SpringBoot service.

For example, if your service uses a cache, you need to collect data such as the cache hit rate; if you use a database connection pool, you need to expose the parameters of the connection pool.

The monitoring tool we use here is Prometheus, which is a time series database that can store our metrics. SpringBoot can be easily connected to Prometheus.

After creating a SpringBoot project, first, add maven dependencies.

org.springframework.boot spring-boot-starter-actuator io.micrometer micrometer-registry-prometheus io.micrometer micrometer-core Then, we need to open the relevant monitoring interface in the application.properties configuration file.

management.endpoint.metrics.enabled=true
management.endpoints.web.exposure.include=*
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true
After starting, we can access http: //localhost:8080/actuator/prometheus to get monitoring data.

insert image description here

It is also relatively simple to monitor business data. You need to inject a MeterRegistry instance. Here is a sample code:

@Autowired
MeterRegistry registry;

@GetMapping("/test")
@ResponseBody
public String test() {
registry.counter(“test”,
“from”, “127.0.0.1”,
“method”, “test”
).increment();

return "ok";

}
From the monitoring connection, we can find the monitoring information just added.

test_total{from="127.0.0.1",method="test",} 5.0
Here is a brief introduction to the popular Prometheus monitoring system. Prometheus uses the pull method to obtain monitoring data. This process of exposing data can be handed over to telegraf with more complete functions. components.

picture

As shown in the figure, we usually use Grafana to display monitoring data, and use the AlertManager component for early warning. The construction of this part is not our focus, and students who are interested can study it by themselves. The following picture is a typical monitoring picture, you can see the cache hit rate of Redis and so on.

picture

2. Java generates flame graph

A flame graph is a tool used to analyze bottlenecks in program execution. In the vertical direction, it indicates the depth of the call stack; in the horizontal direction, it indicates the elapsed time. So the larger the width of the grid, the more likely it is to be a bottleneck.

Flame graphs can also be used to analyze Java applications. You can download the compressed package of async-profiler from github for related operations.

For example, let's extract it to the /root/ directory. Then start the Java application in the way of javaagent. The command line is as follows:

java -agentpath:/root/build/libasyncProfiler.so=start,svg,file=profile.svg -jar spring-petclinic-2.3.1.BUILD-SNAPSHOT.jar
After running for a period of time, stop the process, you can see that the current In the directory, the profile.svg file is generated. This file can be opened with a browser, and you can browse down layer by layer to find the target that needs to be optimized.

3.Skywalking

For a web service, the slowest part is database operations. Therefore, using local cache and distributed cache optimization can get the biggest performance improvement.

For the problem of how to locate in a complex distributed environment, I would like to share another tool: Skywalking.

Skywalking is implemented using probe technology (JavaAgent). By adding the Jar package of javaagent to the Java startup parameters, the performance data and call chain data can be encapsulated and sent to the Skywalking server.

Download the corresponding installation package (if you use ES storage, you need to download a dedicated installation package), after configuring the storage, you can start it with one click.

Extract the compressed package of the agent to the corresponding directory.

tar xvf skywalking-agent.tar.gz -C /opt/
Add the agent package to the service startup parameters. For example, the original startup command is:

java -jar /opt/test-service/spring-boot-demo.jar --spring.profiles.active=dev
The modified startup command is:

java -javaagent:/opt/skywalking-agent/skywalking-agent.jar -Dskywalking.agent.service_name=the-demo-name -jar /opt/test-service/spring-boot-demo.ja --spring.profiles. active=dev
to access the links of some services, open the UI of Skywalking, and you can see the interface as shown below. We can find the interface with relatively slow response and high QPS from the figure, and carry out special optimization.

insert image description here

4. Optimization ideas

For an ordinary Web service, let's take a look at the main links that must be experienced in order to access specific data.

As shown in the figure below, enter the corresponding domain name in the browser, which needs to be resolved to the specific IP address through DNS. In order to ensure high availability, our services generally deploy multiple copies, and then use Nginx for reverse proxy and load balancing.

According to the characteristics of resources, Nginx will undertake part of the function of separation of dynamic and static. Among them, the dynamic function part will enter our SpringBoot service.

picture

SpringBoot uses the embedded tomcat as the web container by default, and uses the typical MVC pattern to finally access our data.

5. HTTP Optimization

Let's take an example to see which actions can speed up the acquisition of web pages. For the convenience of description, we only discuss the HTTP1.1 protocol.

1. Use CDN to accelerate file acquisition

For larger files, try to use CDN (Content Delivery Network) for distribution. Even some commonly used front-end scripts, styles, pictures, etc., can be placed on the CDN. CDNs often speed up the retrieval of these files, and web pages load more quickly.

2. Reasonably set the Cache-Control value

The browser will judge the content of the HTTP header Cache-Control to decide whether to use the browser cache, which is very useful when managing some static files. The header information with the same effect is Expires. Cache-Control indicates how long it will expire, and Expires indicates when it will expire.

This parameter can be set in the Nginx configuration file.

location ~* ^.+.(ico|gif|jpg|jpeg|png)$ { # Cache for 1 year add_header Cache-Control: no-cache, max-age=31536000; } 3. Reduce the number of domain names requested on a single page



Reduce the number of domain names requested by each page, and try to keep it within 4. This is because every time the browser accesses the backend resources, it needs to query the DNS first, then find the IP address corresponding to the DNS, and then make the real call.

DNS has multiple layers of cache, such as browser cache, local host cache, ISP service provider cache, etc. The transition from DNS to IP address usually takes 20-120ms. Reducing the number of domain names can speed up the acquisition of resources.

4. Enable gzip

When gzip is turned on, the content can be compressed first, and then the browser can decompress it. Since the size of the transmission is reduced, the bandwidth usage will be reduced and the transmission efficiency will be improved.

It can be easily enabled in nginx. The configuration is as follows:

gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_comp_level 6;
gzip_http_version 1.1;
gzip_types text/plain application/javascript text/css;
5. Compress resources

Compress JavaScript and CSS, even HTML. The reason is similar. The popular front-end and back-end separation mode generally compresses these resources.

6. Use keepalive

It consumes resources due to the creation and closing of connections. After users access our services, there will be more interactions in the follow-up, so maintaining a long connection can significantly reduce network interactions and improve performance.

Nginx enables keep avlide support for clients by default. You can adjust its behavior with the following two parameters.

http { keepalive_timeout 120s 120s; keepalive_requests 10000; } The long connection between nginx and the upstream upstream needs to be manually enabled. The reference configuration is as follows:



location ~ /{
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection “”;
}

6. Tomcat optimization

The optimization of Tomcat itself is also a very important part.

Why can the old cat tomcat live beyond the age of ordinary cats? This has something to do with its lightweight construction, as well as its excellent performance. Now tomcat, the version has soared to 10! .

There are many configuration parameters of tomcat, but we don't need to pay attention to all of them in order to achieve the optimization effect. This article will introduce some main configuration parameters in detail to ensure that your old cat runs faster!

Generally, the most common change is to modify the port of the server, which is the Connector part in server.xml. A typical example is shown in the following figure:

insert image description here

In fact, most optimizations are also within the Connector tab, from ports, concurrency to threads, can be configured here.

1. 3 parameters to get concurrent configuration

As a web container that can undertake high concurrent Internet requests, of course, the first thing that bears the brunt is the impact of massive requests. Fortunately, Tomcat supports NIO, and we can adjust the number of threads and concurrency configuration to make it show the best performance.

maxThreads – the maximum number of threads that tomcat receives client requests, that is, the number of tasks processed at the same time, its default size is 200; in general, in high-concurrency I/O-intensive applications, this value is set to about 1000 More reasonable
maxConnections This parameter refers to the maximum number of connections that tomcat can accept at the same time. For Java's blocking BIO, the default value is the value of maxthreads; if a custom Executor is used in BIO mode, the default value will be the value of maxThreads in the executor. For Java's new NIO mode, the default value of maxConnections is 10000, so we generally keep this parameter unchanged and
acceptCount - when the number of threads reaches the value set above, the maximum number of queues that can be accepted. If this value is exceeded, the request will be rejected. I usually set it to the same size as maxThreads.
Briefly explain the relationship between the above three parameters:

The number of connections the system can maintain

maxConnections+acceptCount, the difference is that connections in maxConnections can be scheduled for processing; connections in acceptCount can only wait to be queued

The number of requests the system can handle

The size of maxThreads, the actual number of threads that can work.

Happiness index: maxThreads > maxConnections > acceptCount.

Now some articles are also full of maxProcessors and minProcessors. But these two parameters have been deprecated since Tomcat5, and have been completely gone since 6.

It can only be said that the articles you see may really be published by operators who do not understand technology.

Represented by 8, the specific configuration parameters can be found in:

https://tomcat.apache.org/tomcat-8.0-doc/config/http.html

Second, the thread configuration

In terms of concurrency configuration, you can see that we only have minSpareThreads, but no maxSpareThreads. This is because since Tomcat 6 has added Executor nodes, this parameter is no longer useful.

Since the thread is a pool, its configuration satisfies all the characteristics of the pool.

reference:

https://tomcat.apache.org/tomcat-8.0-doc/config/executor.html
namePrefix – the name prefix of each newly opened thread maxThreads –
the maximum number of threads in the thread pool
minSpareThreads – the number of threads that are always active
maxIdleTime – The idle time of the thread, when the idle time is exceeded, these threads will be destroyed
threadPriority – The priority of the thread in the thread pool, the default is 5

Third, get the JVM configuration

tomcat is a Java application, so the configuration of the JVM will also affect its performance. The more important configuration parameters are as follows.

2.1. The size of the memory area
The first thing to adjust is the size of each partition, but this also needs to be divided into the garbage collector. We only look at some global parameters.

-XX:+UseG1GC First, specify the garbage collector used by the JVM. Try not to rely on the default value to guarantee, to explicitly specify one.
-Xmx sets the maximum heap size, generally 2/3 the size of the operating system.
-Xms Set the initial value of the heap, generally set to the same size as Xmx to avoid dynamic expansion.
-Xmn young generation size, the default young generation occupies 1/3 of the heap size. In scenarios with high concurrency and fast death, this area can be appropriately increased. Half and half, or more, are fine. But under G1, there is no need to set this value, it will be adjusted automatically.
-XX:MaxMetaspaceSize Limit the size of the metaspace, generally 256M is enough. This is generally the same as the initial size **-XX:MetaspaceSize .
-XX:MaxDirectMemorySize Set the maximum value of direct memory and limit the memory requested through DirectByteBuffer.
-XX:ReservedCodeCacheSize Set the size of the code storage area after JIT compilation. If you observe that this value is limited, you can adjust it appropriately, which is generally enough.
-Xss Set the size of the stack, the default is 1M, which is enough.
2.2. Memory tuning
-XX:+AlwaysPreTouch will initialize all the memory mentioned in the parameters when it starts up. The startup time will be slower, but the running speed will increase.
-XX:SurvivorRatio The default value is 8. Indicates the ratio of the Eden area to the survivor area.
-XX:MaxTenuringThreshold This value defaults to 6 under CMS and 15 under G1. This value is related to the object improvement we mentioned earlier, and the effect of the change will be more obvious. The age distribution of the subjects can be used
-XX:+PrintTenuringDistribution** print, if the size of the following generations is always the same, which proves that objects after a certain age can always be promoted to the old generation, the promotion threshold can be set smaller.
Objects whose PretenureSizeThreshold exceeds a certain size will be allocated directly in the old generation. However, this parameter is not used much.
2.3. Garbage collector optimization
G1 garbage collector

-XX:MaxGCPauseMillis Set the target pause time, G1 will try to achieve it.
-XX:G1HeapRegionSize Sets the small heap size. This value is a power of 2, neither too large nor too small. If you don't know how to set it, keep the default.
-XX:InitiatingHeapOccupancyPercent When the entire heap memory usage reaches a certain percentage (45% by default), the concurrent marking phase will be started.
-XX:ConcGCThreads The number of threads used by the concurrent garbage collector. The default value varies with the platform the JVM is running on. Modifications are not recommended.

4. Other important configuration

Let's look at a few important parameters configured in Connector.

enableLookups – call request, getRemoteHost() to perform a DNS lookup to return the hostname of the remote host, or directly return the IP address if set to false.
URIEncoding - the character encoding used to decode the URL, if not specified the default value is ISO-8859-1 connectionTimeout
- the connection timeout (in milliseconds)
redirectPort - the specified server received an SSL while processing an http request The port number to redirect after transmitting the request

V. Summary

Tomcat is the most commonly used web container and provides hundreds of configuration parameters. But in our normal use, we don't need to figure out all the parameters, just focus on the most important ones.

7. Custom Web Container

If your project has a high concurrency, and you want to modify the configuration information such as the maximum number of threads and the maximum number of connections, you can customize the web container. The code is as follows.

@SpringBootApplication(proxyBeanMethods = false)
public class App implements WebServerFactoryCustomizer {
public static void main(String[] args) {
SpringApplication.run(PetClinicApplication.class, args);
}
@Override
public void customize(ConfigurableServletWebServerFactory factory) {
TomcatServletWebServerFactory f = (TomcatServletWebServerFactory) factory;
f.setProtocol(“org.apache.coyote.http11.Http11Nio2Protocol”);

f.addConnectorCustomizers(c -> { Http11NioProtocol protocol = (Http11NioProtocol) c.getProtocolHandler(); protocol.setMaxConnections(200); protocol.setMaxThreads(200); protocol.setSelectorTimeout(3000); protocol.setSessionTimeout(3000); protocol. setConnectionTimeout(3000); }); } } Note the above code, we set its protocol to org.apache.coyote.http11.Http11Nio2Protocol, which means Nio2 is enabled. This parameter is only available after Tomcat 8.0, and it will increase some performance after it is turned on. The comparison is as follows:









default.

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners?lastName=
2 threads and 100 connections
Thread calibration: mean lat.: 4588.131ms, rate sampling interval: 16277ms
Thread calibration: mean lat.: 4647.927ms, rate sampling interval: 16285ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 16.49s 4.98s 27.34s 63.90%
Req/Sec 106.50 1.50 108.00 100.00%
6471 requests in 30.03s, 39.31MB read
Socket errors: connect 0, read 0, write 0, timeout 60
Requests/sec: 215.51
Transfer/sec: 1.31MB
Nio2。

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners? lastName=
2 threads and 100 connections
Thread calibration: mean lat.: 4358.805ms, rate sampling interval: 15835ms
Thread calibration: mean lat.: 4622.087ms, rate sampling interval: 16293ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.47s 4.98 s 26.90s 57.69%
Req/Sec 125.50 2.50 128.00 100.00%
7469 requests in 30.04s, 45.38MB read
Socket errors: connect 0, read 0, write 0, timeout 4
Requests/sec
: 248.64 Transfer/sec: 1.51MB
You can even Replace tomcat with undertow. Undertow is also a web container, which is more lightweight, occupies less content, and starts fewer daemon processes. The changes are as follows:

org.springframework.boot spring-boot-starter-web org.springframework.boot spring-boot-starter-tomcat org.springframework.boot spring-boot-starter-undertow ## 8. Optimization direction of each level Controller layer controller layer It is used to receive the query parameters of the front end, and then construct the query result. Many projects now use a separate front-end and back-end architecture, so the controller layer method generally uses the @ResponseBody annotation to parse the query result into JSON data and return it (taking into account both efficiency and readability).

Since the controller only acts as a similar function combination and routing role, the impact of this part on performance is mainly reflected in the size of the data set. If the result set is very large, the JSON parsing component will spend more time parsing.

Large result sets not only affect parsing time, but also cause wasted memory. If the result set occupies 10MB of memory before it is parsed into JSON, then during the parsing process, 20MB or more of memory may be used to do this work. I've seen many cases where memory usage spikes due to deeply nested returned objects, referencing objects that shouldn't be referenced (such as very large byte[] objects).

Therefore, for general services, it is very necessary to keep the result set compact, which is also necessary for the existence of DTO (data transfer object). If your project returns a complex result structure, it is necessary to perform a conversion on the result set.

Additionally, the Controller layer can be optimized using asynchronous servlets. Its principle is as follows: after the servlet receives the request, it transfers the request to an asynchronous thread to perform business processing, and the thread itself returns to the container. After the asynchronous thread processes the business, it can directly generate response data, or continue to forward the request to other servlets. .

Service layer
The service layer is used to deal with specific business, and most of the functional requirements are completed here. The service layer generally uses the singleton pattern (prototype), rarely saves state, and can be reused by the controller.

The code organization of the service layer has a great impact on the readability and performance of the code. Most of the design patterns we often talk about are aimed at the service layer.

The point to be highlighted here is distributed transactions.

insert image description here

As shown above, the four operations are spread across three different resources. Achieving consistency requires three different resources to coordinate in unison. Their underlying protocols and implementation methods are all different. That cannot be solved by the Transaction annotation provided by Spring, and it needs to be done with the help of external components.

Many people have experienced it, adding some code to ensure consistency, and after a stress test, the performance dropped and their jaw dropped. Distributed transactions are performance killers because they use extra steps to ensure consistency. Common methods include: two-phase commit scheme, TCC, local message table, MQ transaction message, distributed transaction middleware, etc.

picture

As shown in the figure above, distributed transactions should be comprehensively considered in terms of transformation cost, performance, and effectiveness. There is a term between distributed transactions and non-transactions, called flexible transactions. The idea of ​​flexible transactions is to move business logic and mutually exclusive operations from the resource layer to the business layer.

Regarding traditional transactions and flexible transactions, let's briefly compare them.

ACID

The biggest feature of relational databases is transaction processing, that is, to meet ACID.

Atomicity: Either all or none of the operations in a transaction are performed.
Consistency: The system must always be in a strongly consistent state.
Isolation: The execution of a transaction cannot be interfered with by other transactions.
Durability: A committed transaction makes permanent changes to the data in the database.
BASE

The BASE approach improves availability and system performance by sacrificing consistency and isolation.

BASE is the abbreviation of Basically Available, Soft-state, Eventually consistent, where BASE stands for:

Basicly Available: The system can basically operate and provide services all the time.
Soft-state: The system does not require a strongly consistent state all the time.
Eventual consistency: The system needs to achieve consistency after a certain time.
For Internet services, it is recommended to use compensation transactions to achieve eventual consistency. For example, through a series of timed tasks, data restoration is completed. For details, please refer to the following article.

What are the commonly used distributed transactions? Which should I use?

Dao layer
After a reasonable data cache, we will try to avoid requests from penetrating to the Dao layer. Unless you are particularly familiar with the caching features provided by the ORM itself, it is recommended that you use a more general approach to caching data.

Dao layer, mainly in the use of ORM framework. For example, in JPA, if a one-to-many or many-to-many mapping relationship is added, but lazy loading is not enabled, it is easy to cause deep retrieval when cascading queries, resulting in high memory overhead and slow execution. .

In some businesses with a relatively large amount of data, the method of sub-database and sub-table is often used. In these sub-database and sub-table components, many simple query statements will be re-parsed and distributed to each node for operation, and finally the results will be merged.

For example, the simple count statement of select count(*) from a may route the request to more than a dozen tables for calculation, and finally perform statistics at the coordinating node. The execution efficiency is conceivable. At present, the most representative middleware for sub-database and sub-table are ShardingJdbc in the driver layer and MyCat in the proxy layer, both of which have such problems. The views provided to users by these components are consistent, but we must pay attention to these differences when coding.

End
Let's summarize below.

We briefly took a look at the common optimization ideas of SpringBoot. We introduce three new performance analysis tools. One is the monitoring system Prometheus, you can see some specific indicators; one is the flame graph, you can see the specific code hotspots; the other is Skywalking, which can analyze the call chain in the distributed environment. When we have doubts about performance, we will use a method similar to Shennong's taste of herbs, and analyze the results of various evaluation tools.

SpringBoot's own web container is Tomcat, then we can get performance improvement by tuning Tomcat. Of course, we also provide a series of optimization ideas for the load balancing Nginx at the upper layer of the service.

Finally, we looked at some optimization directions of Controller, Service, and Dao under the classic MVC architecture, and focused on the distributed transaction problem of the Service layer.

As a widely used service framework, SpringBoot has done a lot of work in performance optimization and selected many high-speed components. For example, the database connection pool uses hikaricp by default, the Redis cache framework uses lettuce by default, and the local cache provides caffeine. For a common web service interacting with a database, caching is the most important optimizer.

Guess you like

Origin blog.csdn.net/liuerchong/article/details/122956573