SpringBoot performance optimization article, programmers are not allowed to take detours

Original: Miss Sister Taste (WeChat Public Account ID: xjjdog), welcome to share, please keep the source for reprinting.

SpringBoot has become the No.1 framework in Java, ravaging millions of programmers every day. When the pressure on the service rises, the optimization of the SpringBoot service will be put on the agenda.

This article will explain the general idea of ​​Spring Boot service optimization in detail, and attach several auxiliary articles as appetizers.

This article is long and is most suitable for collection.

1. There is monitoring to have direction

Before starting to optimize the performance of the SpringBoot service, we need to do some preparations to expose some data of the SpringBoot service.

For example, if your service uses a cache, you need to collect data such as the cache hit rate; if you use a database connection pool, you need to expose the parameters of the connection pool.

The monitoring tool we use here is Prometheus, which is a time series database that can store our metrics. SpringBoot can be easily connected to Prometheus.

After creating a SpringBoot project, first, add maven dependencies.

<dependency>
     <groupId>org.springframework.boot</groupId>
     <artifactId>spring-boot-starter-actuator</artifactId>
 </dependency>
 <dependency>
     <groupId>io.micrometer</groupId>
     <artifactId>micrometer-registry-prometheus</artifactId>
 </dependency>
 <dependency>
     <groupId>io.micrometer</groupId>
     <artifactId>micrometer-core</artifactId>
 </dependency>

复制代码

Then, we need to open the relevant monitoring interface in the application.properties configuration file.

management.endpoint.metrics.enabled=true
management.endpoints.web.exposure.include=*
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true

复制代码

After startup, we can get monitoring data by visiting http://localhost:8080/actuator/prometheus.

It is also relatively simple to monitor business data. You just need to inject an instance of MeterRegistry. Here is a sample code:

@Autowired
MeterRegistry registry;

@GetMapping("/test")
@ResponseBody
public String test() {
    registry.counter("test",
            "from", "127.0.0.1",
            "method", "test"
    ).increment();

    return "ok";
}

复制代码

From the monitoring connection, we can find the monitoring information we just added.

test_total{from="127.0.0.1",method="test",} 5.0

复制代码

Here is a brief introduction to the popular Prometheus monitoring system. Prometheus uses the pull method to obtain monitoring data. This process of exposing data can be handed over to the telegraf component with more complete functions.

As shown in the figure, we usually use Grafana to display monitoring data, and use the AlertManager component for early warning. The construction of this part is not our focus, and students who are interested can study it by themselves. The following picture is a typical monitoring picture, you can see the cache hit rate of Redis and so on.

2. Java generates flame graph

A flame graph is a tool used to analyze bottlenecks in program execution. In the vertical direction, it indicates the depth of the call stack; in the horizontal direction, it indicates the elapsed time. So the larger the width of the grid, the more likely it is to be a bottleneck.

Flame graphs can also be used to analyze Java applications. You can download the compressed package of async-profiler from github for related operations.

For example, let's extract it to the /root/ directory. Then start the Java application in the way of javaagent. The command line is as follows:

java -agentpath:/root/build/libasyncProfiler.so=start,svg,file=profile.svg -jar spring-petclinic-2.3.1.BUILD-SNAPSHOT.jar

复制代码

After running for a period of time, stop the process, and you can see that the profile.svg file is generated in the current directory. This file can be opened with a browser, and you can browse down layer by layer to find the target that needs to be optimized.

3.Skywalking

For a web service, the slowest part is database operations. Therefore, using local cache and distributed cache optimization can get the biggest performance improvement.

For the problem of how to locate in a complex distributed environment, I would like to share another tool: Skywalking.

Skywalking is implemented using probe technology (JavaAgent). By adding the Jar package of javaagent to the Java startup parameters, the performance data and call chain data can be encapsulated and sent to the Skywalking server.

Download the corresponding installation package (if you use ES storage, you need to download a dedicated installation package), after configuring the storage, you can start it with one click.

Extract the compressed package of the agent to the corresponding directory.

tar xvf skywalking-agent.tar.gz  -C /opt/

复制代码

Add the agent package to the service startup parameters. For example, the original startup command is:

java  -jar /opt/test-service/spring-boot-demo.jar  --spring.profiles.active=dev

复制代码

The modified startup command is:

java -javaagent:/opt/skywalking-agent/skywalking-agent.jar -Dskywalking.agent.service_name=the-demo-name  -jar /opt/test-service/spring-boot-demo.ja  --spring.profiles.active=dev

复制代码

Visit the links of some services, open the UI of Skywalking, and you can see the interface as shown below. We can find the interface with relatively slow response and high QPS from the figure, and carry out special optimization.

4. Optimization ideas

For an ordinary Web service, let's take a look at the main links that must be experienced in order to access specific data.

As shown in the figure below, enter the corresponding domain name in the browser, which needs to be resolved to the specific IP address through DNS. In order to ensure high availability, our services generally deploy multiple copies, and then use Nginx for reverse proxy and load balancing.

According to the characteristics of resources, Nginx will undertake part of the function of separation of dynamic and static. Among them, the dynamic function part will enter our SpringBoot service.

SpringBoot uses the embedded tomcat as the web container by default, and uses the typical MVC pattern to finally access our data.

5. HTTP Optimization

Let's take an example to see which actions can speed up the acquisition of web pages. For the convenience of description, we only discuss the HTTP1.1 protocol.

1. Use CDN to accelerate file acquisition

For larger files, try to use CDN (Content Delivery Network) for distribution. Even some commonly used front-end scripts, styles, pictures, etc., can be placed on the CDN. CDNs often speed up the retrieval of these files, and web pages load more quickly.

2. Reasonably set the Cache-Control value

The browser will judge the content of the HTTP header Cache-Control to decide whether to use the browser cache, which is very useful when managing some static files. The header information with the same effect is Expires. Cache-Control indicates how long it will expire, and Expires indicates when it will expire.

This parameter can be set in the Nginx configuration file.

location ~* ^.+\.(ico|gif|jpg|jpeg|png)$ { 
            # 缓存1年
            add_header Cache-Control: no-cache, max-age=31536000;
}

复制代码

3. Reduce the number of domain names requested for a single page

Reduce the number of domain names requested by each page, and try to keep it within 4. This is because, every time the browser accesses the back-end resources, it needs to query the DNS first, then find the IP address corresponding to the DNS, and then make the real call.

DNS has multiple layers of cache, such as browser cache, local host cache, ISP service provider cache, etc. The transition from DNS to IP address usually takes 20-120ms. Reducing the number of domain names can speed up the acquisition of resources.

4. Enable gzip

When gzip is turned on, the content can be compressed first, and then the browser can decompress it. Since the size of the transmission is reduced, the bandwidth usage will be reduced and the transmission efficiency will be improved.

It can be easily enabled in nginx. The configuration is as follows:

gzip on;
gzip_min_length 1k;
gzip_buffers 4 16k;
gzip_comp_level 6;
gzip_http_version 1.1;
gzip_types text/plain application/javascript text/css;

复制代码

5. Compress resources

Compress JavaScript and CSS, even HTML. The reason is similar. The popular front-end and back-end separation mode generally compresses these resources.

6. Use keepalive

It consumes resources due to the creation and closing of connections. After users access our services, there will be more interactions in the follow-up, so maintaining a long connection can significantly reduce network interactions and improve performance.

Nginx enables keep avlide support for clients by default. You can adjust its behavior with the following two parameters.

http {
    keepalive_timeout  120s 120s;
    keepalive_requests 10000;
}

复制代码

The long connection between nginx and the upstream upstream needs to be manually enabled. The reference configuration is as follows:

location ~ /{ 
       proxy_pass http://backend;
       proxy_http_version 1.1;
       proxy_set_header Connection "";
}

复制代码

6. Tomcat optimization

The optimization of Tomcat itself is also a very important part. You can directly refer to the article below.

Get tomcat important parameter tuning!

7. Custom Web Container

If your project has a high concurrency, and you want to modify the configuration information such as the maximum number of threads and the maximum number of connections, you can customize the web container. The code is as follows.

@SpringBootApplication(proxyBeanMethods = false)
public class App implements WebServerFactoryCustomizer<ConfigurableServletWebServerFactory> {
 public static void main(String[] args) {
  SpringApplication.run(PetClinicApplication.class, args);
 }
 @Override
 public void customize(ConfigurableServletWebServerFactory factory) {
  TomcatServletWebServerFactory f = (TomcatServletWebServerFactory) factory;
        f.setProtocol("org.apache.coyote.http11.Http11Nio2Protocol");

  f.addConnectorCustomizers(c -> {
   Http11NioProtocol protocol = (Http11NioProtocol) c.getProtocolHandler();
   protocol.setMaxConnections(200);
   protocol.setMaxThreads(200);
   protocol.setSelectorTimeout(3000);
   protocol.setSessionTimeout(3000);
   protocol.setConnectionTimeout(3000);
  });
 }
}

复制代码

Note that in the above code, we set its protocol to org.apache.coyote.http11.Http11Nio2Protocol, which means that Nio2 is enabled. This parameter is only available after Tomcat 8.0, and it will increase some performance after it is turned on. The comparison is as follows:

default.

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners?lastName=
  2 threads and 100 connections
  Thread calibration: mean lat.: 4588.131ms, rate sampling interval: 16277ms
  Thread calibration: mean lat.: 4647.927ms, rate sampling interval: 16285ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.49s     4.98s   27.34s    63.90%
    Req/Sec   106.50      1.50   108.00    100.00%
  6471 requests in 30.03s, 39.31MB read
  Socket errors: connect 0, read 0, write 0, timeout 60
Requests/sec:    215.51
Transfer/sec:      1.31MB

复制代码

Nio2

[root@localhost wrk2-master]# ./wrk -t2 -c100 -d30s -R2000 http://172.16.1.57:8080/owners?lastName=
Running 30s test @ http://172.16.1.57:8080/owners?lastName=
  2 threads and 100 connections
  Thread calibration: mean lat.: 4358.805ms, rate sampling interval: 15835ms
  Thread calibration: mean lat.: 4622.087ms, rate sampling interval: 16293ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.47s     4.98s   26.90s    57.69%
    Req/Sec   125.50      2.50   128.00    100.00%
  7469 requests in 30.04s, 45.38MB read
  Socket errors: connect 0, read 0, write 0, timeout 4
Requests/sec:    248.64
Transfer/sec:      1.51MB

复制代码

You can even replace tomcat with undertow. Undertow is also a web container, which is more lightweight, occupies less content, and starts fewer daemon processes. The changes are as follows:

<dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
      <exclusions>
        <exclusion>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-tomcat</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-undertow</artifactId>
    </dependency>

复制代码

8. Optimization direction at each level

Controller layer

The controller layer is used to receive the query parameters of the front end, and then construct the query result. Many projects now use a separate front-end and back-end architecture, so the controller layer method generally uses the @ResponseBody annotation to parse the query result into JSON data and return it (taking into account both efficiency and readability).

Since the controller only acts as a similar function combination and routing role, the impact of this part on performance is mainly reflected in the size of the data set. If the result set is very large, the JSON parsing component will spend more time parsing.

Large result sets not only affect parsing time, but also cause wasted memory. If the result set occupies 10MB of memory before it is parsed into JSON, then during the parsing process, 20MB or more of memory may be used to do this work. I've seen many cases where memory usage spikes due to deeply nested returned objects, referencing objects that shouldn't be referenced (such as very large byte[] objects).

Therefore, for general services, it is very necessary to keep the result set compact, which is also necessary for the existence of DTO (data transfer object). If your project returns a complex result structure, it is necessary to perform a conversion on the result set.

Additionally, the Controller layer can be optimized using asynchronous servlets. Its principle is as follows: after the servlet receives the request, it transfers the request to an asynchronous thread to perform business processing, and the thread itself returns to the container. After the asynchronous thread processes the business, it can directly generate response data, or continue to forward the request to other servlets. .

Service layer

The service layer is used to deal with specific business, and most of the functional requirements are completed here. The service layer generally uses the singleton pattern (prototype), rarely saves state, and can be reused by the controller.

The code organization of the service layer has a great impact on the readability and performance of the code. Most of the design patterns we often talk about are aimed at the service layer.

The point to be highlighted here is distributed transactions.

As shown above, the four operations are spread across three different resources. Achieving consistency requires three different resources to coordinate in unison. Their underlying protocols and implementation methods are all different. That cannot be solved by the Transaction annotation provided by Spring, and it needs to be done with the help of external components.

Many people have experienced it, adding some code to ensure consistency, and after a stress test, the performance dropped and their jaw dropped. Distributed transactions are performance killers because they use extra steps to ensure consistency. Common methods include: two-phase commit scheme, TCC, local message table, MQ transaction message, distributed transaction middleware, etc.

As shown in the figure above, distributed transactions should be comprehensively considered in terms of transformation cost, performance, and effectiveness. There is a term between distributed transactions and non-transactions, called flexible transactions. The idea of ​​flexible transactions is to move business logic and mutually exclusive operations from the resource layer to the business layer.

Regarding traditional transactions and flexible transactions, let's briefly compare them.

ACID

The biggest feature of relational databases is transaction processing, that is, to meet ACID.

  • Atomicity: Either all or none of the operations in a transaction are performed.
  • Consistency: The system must always be in a strongly consistent state.
  • Isolation: The execution of a transaction cannot be interfered with by other transactions.
  • Durability: A committed transaction makes permanent changes to the data in the database.

BASE

The BASE approach improves availability and system performance by sacrificing consistency and isolation.

BASE is the abbreviation of Basically Available, Soft-state, Eventually consistent, where BASE stands for:

  • Basicly Available: The system can basically operate and provide services all the time.
  • Soft-state: The system does not require a strongly consistent state all the time.
  • Eventual consistency: The system needs to achieve consistency after a certain time.

For Internet services, it is recommended to use compensation transactions to achieve eventual consistency. For example, through a series of timed tasks, data restoration is completed. For details, please refer to the following article.

What are the commonly used distributed transactions? Which should I use?

Dao layers

After reasonable data caching, we will try our best to avoid requests from penetrating the Dao layer. Unless you are particularly familiar with the caching features provided by the ORM itself, it is recommended that you use a more general approach to caching data.

Dao layer, mainly in the use of ORM framework. For example, in JPA, if a one-to-many or many-to-many mapping relationship is added, but lazy loading is not enabled, it is easy to cause deep retrieval when cascading queries, resulting in high memory overhead and slow execution. .

In some businesses with a relatively large amount of data, the method of sub-database and sub-table is often used. In these sub-database and sub-table components, many simple query statements will be re-parsed and distributed to each node for operation, and finally the results will be merged.

For example, the simple count statement of select count(*) from a may route the request to more than a dozen tables for calculation, and finally perform statistics at the coordinating node. The execution efficiency is conceivable. At present, the most representative middleware for sub-database and sub-table are ShardingJdbc in the driver layer and MyCat in the proxy layer, both of which have such problems. The views provided to users by these components are consistent, but we must pay attention to these differences when coding.

End

Let's summarize.

We briefly took a look at the common optimization ideas of SpringBoot. We introduce three new performance analysis tools. One is the monitoring system Prometheus, which can see the size of some specific indicators; the other is the flame graph, which can see specific code hotspots; the other is Skywalking, which can analyze the call chain in the distributed environment. When we have doubts about performance, we will use a method similar to Shennong's taste of herbs, and analyze the results of various evaluation tools.

SpringBoot's own web container is Tomcat, then we can get performance improvement by tuning Tomcat. Of course, we also provide a series of optimization ideas for the load balancing Nginx at the upper layer of the service.

Finally, we looked at some optimization directions of Controller, Service, and Dao under the classic MVC architecture, and focused on the distributed transaction problem of the Service layer.

Here is a concrete optimization example.

5 seconds to 1 second, count a "very" significant performance optimization

As a widely used service framework, SpringBoot has done a lot of work in performance optimization and selected many high-speed components. For example, the database connection pool uses hikaricp by default, the Redis cache framework uses lettuce by default, and the local cache provides caffeine. For a common web service interacting with a database, caching is the most important optimizer. But the details determine success or failure. If you want to optimize the system to the extreme, you also need to refer to the following article.

Finish!

About the author: Miss Sister Taste (xjjdog), a public account that does not allow programmers to take detours. Focus on infrastructure and Linux. Ten years of architecture, tens of billions of daily traffic, discussing the high concurrency world with you, giving you a different taste. My personal WeChat xjjdog0, welcome to add friends for further communication.

Guess you like

Origin blog.csdn.net/wdjnb/article/details/124427520