[Optimization] Performance optimization of the front and back ends of the website

front end

1. Minimize the amount of data to be transferred

First, remove all unused parts, such as inaccessible functions in JavaScript, styles with selectors that never match any elements, and HTML tags that are forever hidden in CSS. Second, remove all duplicates. Then, I recommend setting up an automatic minification process. For example, it should remove all comments for backend services (but not source code) and every character that does not contain other information (such as whitespace characters in JS). Once that's done, what we're left with can be text. This means we can safely apply compression algorithms such as GZIP (understood by most browsers). Finally, there is caching. This won't help much when the browser first renders the page, but will save a lot on subsequent visits. But the key is to remember two things:

If using a CDN, make sure caching is supported and properly set up there.

Rather than waiting for a resource's expiration date, you'd probably want a way to renew it earlier from your perspective. Embeds the "fingerprint" of the file into the URL, invalidating local caches.

Of course, a caching strategy should be defined for each resource. Some may change little, or not at all. Other countries are changing more quickly. Some of them contain sensitive information and others may be considered public information. Use the "private" directive to prevent CDNs from caching private data. Web images can also be optimized, although image requests do not prevent parsing or rendering.

2. Reduce the total number of key resources

"Critical" simply refers to the resources that a web page needs to render correctly. Therefore, we can skip all styles that are not directly involved in the process, and also all scripts.
style sheet

In order to tell the browser that a specific CSS file is not required, we should set the media attribute for all links referencing the stylesheet. Using this approach, the browser will only process resources matching the current media (device type, screen size) as needed, while lowering the priority of all other stylesheets (they will be processed, but not as part of the critical rendering path ). For example, if you add a media="print" attribute to a style tag that references print page styles, those styles will not interfere with the critical rendering path when the media is not printed (that is, when the page is displayed in a browser).

To further improve the process, some styles can also be inlined. This saves us at least one round trip to the server that would otherwise be required to fetch the stylesheet.
script

As mentioned above, scripts are parser blocking because they can alter the DOM and CSSOM. So scripts that don't change them shouldn't be block parsing, saving us time. In order for this to work, all script tags must be marked with the async or defer attribute.

Scripts marked async do not block DOM construction or CSSOM, as they can be executed before CSSOM is constructed. But keep in mind that inline scripts will block CSSOM anyway, unless you put them on top of CSS. In contrast, scripts marked as "delayed" will be evaluated at the end of page load. Therefore, they should not affect the document (otherwise a re-render would be triggered).

In other words, with defer, the script is not executed until the page load event fires, whereas async allows the script to run in the background while the document is parsed.

3. Shorten the critical rendering path length

Finally, the CRP length should be shortened to the minimum possible. To some extent, the above method can do this.

Media queries as attributes of style tags will reduce the total number of resources that must be downloaded. The script tag attributes defer and async will prevent the corresponding script from blocking parsing. Minifying, compressing and archiving resources using GZIP will reduce the size of the transferred data (and thus reduce the data transfer time). Inlining certain styles and scripts reduces the number of round trips between the browser and the server.

What we haven't discussed yet is the option to rearrange code between files. According to the latest best performance philosophy, the quickest first thing a website should do is display ATF content, ATF stands for above the fold, which is the immediately visible area without scrolling. So it's better to rearrange everything related to rendering in such a way that required styles and scripts are loaded first, everything else stops - neither parse nor render, and always remember to measure before and after making changes.

4. Network transmission optimization

insert image description here
Here we focus on three time indicators:

  1. Total Connection Time: the overall connection time
  2. TTFB (Time to First Byte): Time-consuming for the first byte transmission
  3. Content Download: time-consuming content transfer

Total Connection Time

There may be many factors that cause the connection to take a long time:

The physical distance between the machine and the client is too long (USA-China)

Repeated establishment of links, multiple different domain names are used in the page, and the connection needs to be re-established each time

Client network environment problems

So what can we do to solve these problems:

Use CDN to dynamically accelerate the main domain name, cache resource domain names, and use the characteristics of edge nodes to shorten the distance of user requests

Use pre-connect to pre-build the domain name and gather the domain names at the same time, so that the time-consuming of building a connection can be reduced in the case of http2

Make full use of the features of http cache and servicesworker request interception to cache cacheable resources locally and reduce the number of network requests

TTFB + Content Download

TTFB is the time from initiating the request to receiving the first byte of the server request. Generally speaking, if the TTFB of the first screen html request can reach within 100ms, you already have a good experience. If it exceeds 500ms, then the user will You can clearly feel the white screen. To be precise, TTFB is the time difference between sending an HTTP request message and receiving the first response message from the server after completing the DNS query, TCP handshake, and SSL handshake, which is approximately equal to one RTT ( Round-Trip Time is the round-trip delay) + ServerRT.

So how to optimize when TTFB takes a long time? You can refer to the following ways:

Reduce the amount of request transmission and avoid useless information

Reduce server-side processing time (increase caching, slow SQL management, etc.)

Streaming rendering of the first screen HTML content, because the browser's parsing of HTML does not depend on downloading the complete HTML, but parsing and rendering a part, so the server can first render part of the prepared content through streaming Return instead of waiting for everything to be ready before returning

Lazy loading: return the necessary content first, such as a super long page, you can return the content seen on the first screen first, and render the rest through asynchronous loading, and request through multiple interfaces

So, is the shorter the TTFB the better?

In fact, this is not always the case. We need to make a good balance between TTFB and Content Download. For example, when we enable gzip/br compression, TTFB will inevitably show an upward trend, but the corresponding resource volume will become smaller, which will speed up the transmission time and reduce the Content Download time, so we should pay attention to the real user experience, rather than blindly staring at the time for optimization.

5. Preload

Preload is preloading. There are many ways to preload, and there are different solutions for internal and external devices. The more common ones are:

preload tag:

serviceworker preload: flasher, workbox-preload, etc.

zcache: preload in the client through resource offline package

rear end

insert image description here

1. Batch thinking: batch operation database

Before optimization:

//for循环单笔入库
for(TransDetail detail:transDetailList){
    
    
  insert(detail);  
}

Optimized:

batchInsert(transDetailList);

To make an analogy : Suppose you need to move 10,000 bricks to the top of a building, and you have an elevator, which can put an appropriate amount of bricks (up to 500) at a time. You can choose to transport one brick at a time, or you can transport 500 bricks at a time. What do you think Which way is more convenient and less time consuming?

2. Asynchronous thinking: time-consuming operations, consider putting them into asynchronous execution

For time-consuming operations, consider using asynchronous processing, which can reduce the time-consuming interface.

Assume that a transfer interface, matching the bank number, is executed synchronously, but its operation takes a long time. The process before optimization:

insert image description here
In order to reduce the interface time-consuming and return faster, you can move the matching line number to asynchronous processing. After optimization:
insert image description here
In addition to the example of transfer, there are many such examples in daily work. For example: After the user registers successfully, SMS and email notifications can also be processed asynchronously~
As for the asynchronous implementation method, you can use a thread pool or a message queue to implement.

3. The idea of ​​​​trading space for time: use the cache properly.

In appropriate business scenarios, proper use of cache can greatly improve interface performance. Caching is actually a kind of idea of ​​exchanging space for time, that is, you put the data you want to check into the cache in advance, and when necessary, check the cache directly instead of checking the database or the calculation process.

The cache here includes: Redis cache, JVM local cache, memcached, or Map, etc. Let me give you a design that uses cache optimization once in my work. It is relatively simple, but the idea is very meaningful.

It was an optimization of the transfer interface. In the old code, every time a transfer was made, the database would be queried and the matching bank number calculated based on the customer account number.

insert image description here
Because every time the database is checked, the matching is calculated, which is time-consuming, so the cache is used. The optimized process is as follows:

insert image description here

4. Prefetch idea: initialize to cache in advance

The idea of ​​prefetching is easy to understand. It is to initialize the data to be calculated and queried to the cache in advance. If you need to use complexly calculated data at a certain time in the future to calculate in real time, it may take a long time. At this time, we can adopt the idea of ​​prefetching, calculate the data that may be needed in the future in advance, put it in the cache, and fetch it from the cache when needed. This will greatly improve interface performance.

I remember that when I was doing live video broadcasting in the first company, I saw that this optimization scheme was used in our live broadcast list. It is to start a task, and initialize the relevant information such as live broadcast users and points to the cache in advance.

5. Pooling idea: pre-allocation and recycling

Everyone should remember, why do we need to use the thread pool?

The thread pool can help us manage threads and avoid increasing the resource consumption of creating threads and destroying threads.

If you create a thread every time you need to use it, it will take a certain amount of time, and the thread pool can reuse threads to avoid unnecessary time-consuming. Pooling technology does not only refer to thread pools. Many scenarios have the embodiment of pooling ideas. Its essence is pre-allocation and recycling.

For example, the TCP three-way handshake is familiar to everyone. In order to reduce performance loss, it introduces Keep-Alive long connections to avoid frequent creation and destruction of connections. Of course, there are many similar examples, such as database connection pool, HttpClient connection pool.

In the process of writing code, we learn the idea of ​​pooling. The most directly related thing is to use the thread pool instead of creating a new thread.

6. Event callback idea: refuse to block waiting.

If you call an interface of system B, but it processes business logic, it takes 10s or more. Then do you block and wait until the downstream interface of system B returns, and then continue your next operation? This is obviously unreasonable.

We refer to the IO multiplexing model. That is, we don't need to block the interface waiting for system B, but do other operations first. After the interface of system B is processed, through the event callback notification, our interface receives the notification and then performs the corresponding business operation.

7. The remote call is changed from serial to parallel

Suppose we design an interface for the home page of an APP, which needs to check user information, banner information, pop-up window information, and so on. If it is checked one by one in series, for example, it takes 200ms to check user information, 100ms to check banner information, and 50ms to check pop-up window information, then it will take a total of 350ms. If you also check other information, it will take even more time.
insert image description here
In fact, we can change it to parallel calls, that is, to check user information, check banner information, and check pop-up window information, which can be initiated in parallel at the same time.
insert image description here

8. Avoid too coarse lock granularity

In high concurrency scenarios, in order to prevent overselling and other situations, we often need to lock to protect shared resources. However, if the locking granularity is too coarse, it will greatly affect the performance of the interface.

What is lock granularity?

In fact, it is how big the scope you want to lock. For example, if you go to the bathroom at home, you only need to lock the bathroom. You don’t need to lock the whole house to prevent family members from entering. The bathroom is your lock granularity.

Regardless of whether you use synchronized locks or redis distributed locks, you only need to lock shared critical resources. If shared resources are not involved, locking is not necessary. It's like when you go to the bathroom, you don't need to lock the whole house, just lock the bathroom door.

For example, in the business code, there is an ArrayList that needs to be locked because it involves multi-threaded operations. Suppose there happens to be another time-consuming operation (the slowNotShare method in the code) that does not involve thread safety issues. Anti-example locking is a one-pot end, all locked:

//不涉及共享资源的慢方法
private void slowNotShare() {
    
    
    try {
    
    
        TimeUnit.MILLISECONDS.sleep(100);
    } catch (InterruptedException e) {
    
    
    }
}

//错误的加锁方法
public int wrong() {
    
    
    long beginTime = System.currentTimeMillis();
    IntStream.rangeClosed(1, 10000).parallel().forEach(i -> {
    
    
        //加锁粒度太粗了,slowNotShare其实不涉及共享资源
        synchronized (this) {
    
    
            slowNotShare();
            data.add(i);
        }
    });
    log.info("cosume time:{}", System.currentTimeMillis() - beginTime);
    return data.size();
}

Positive example:

public int right() {
    
    
    long beginTime = System.currentTimeMillis();
    IntStream.rangeClosed(1, 10000).parallel().forEach(i -> {
    
    
        slowNotShare();//可以不加锁
        //只对List这部分加锁
        synchronized (data) {
    
    
            data.add(i);
        }
    });
    log.info("cosume time:{}", System.currentTimeMillis() - beginTime);
    return data.size();
}

9. Switch storage mode: file transfer temporary storage data

If the data is too large and the landing database is really slow, you can consider temporarily storing it in a file first. Save the file first, then download the file asynchronously, and slowly save it to the database.

It may be a bit abstract here, let me share with you a real optimization case I had before.

A transfer interface was developed before. If it is enabled concurrently, with 10 degrees of concurrency and 1000 transfer details per batch, database insertion will take a very long time, about 6 seconds; this is related to our company's database synchronization mechanism. In the case of concurrency, because synchronization is prioritized , so parallel insertion becomes serial, which is very time-consuming.

Before optimization, 1,000 detailed transfer data will first be landed in the DB database, returned to the user during processing, and then transferred asynchronously. As shown in the picture:
insert image description here
I remember that during the pressure test, the high concurrency situation, these 1000 details were stored in the warehouse, and it took a lot of time. So I changed my thinking, saved the batch detailed transfer records to the file server, and then recorded a total transfer record to the database. Then asynchronously download the details, transfer and store the details. After the final optimization, the performance has been improved by more than ten times.

After optimization, the flow chart is as follows:

insert image description here
If the time-consuming bottleneck of your interface lies in the database insertion operation, which is used for batch operations, etc., and the effect is still not satisfactory, you can consider using files or MQ for temporary storage. Sometimes it is more efficient to put batch data into a file than to insert it into a database.

10. Index

When it comes to interface optimization, many friends will think of adding indexes. Yes, adding an index is the least costly optimization, and it generally works well.

When it comes to index optimization, we generally think about it from these dimensions:

  • Have you indexed your SQL yet?
  • Does your index actually work?
  • Is your index building reasonable?

10.1 SQL is not indexed

When we were developing, it was easy to neglect and forget to add indexes to SQL. So when we finished writing the SQL, we took a look at the explain execution plan.

explain select * from user_info where userId like '%123';

You can also use the command show create table to see the index status of the entire table.

show create table user_info;

If a table forgets to add an index, you can add the index through the alter table add index command

alter table user_info add index idx_name (name);

Generally, it is: the fields of the where condition of SQL, or the fields after order by and group by, need to add indexes.

10.2 Index does not take effect

Sometimes, even if you add an index, the index will become invalid.
insert image description here

10.3 Index design is unreasonable

Our index is not as many as possible, it needs to be designed reasonably. for example:

  • Remove redundant and duplicate indexes.
  • Indexes generally cannot exceed 5
  • Indexes are not suitable for fields with a large amount of repeated data, such as gender fields
  • Proper use of covering indexes
  • If you need to use force index to force an index away, you need to think about whether your index design is really reasonable

11. Optimize SQL

With index optimization, in fact, SQL still has a lot of room for other optimizations. For example these:
insert image description here

12. Avoid big business problems

In order to ensure the consistency of database data, we often need to use transactions when multiple database modification operations are involved. And using spring declarative transactions is very simple, just use one annotation @Transactional, as in the following example:

@Transactional
public int createUser(User user){
    
    
    //保存用户信息
    userDao.save(user);
    passCertDao.updateFlag(user.getPassId());
    return user.getUserId();
}

The main logic of this code is to create a user, and then update a pass mark. If you add a new requirement now, after creating a user, call the remote interface to send an email message notification, many small partners will write like this:

@Transactional
public int createUser(User user){
    
    
    //保存用户信息
    userDao.save(user);
    passCertDao.updateFlag(user.getPassId());
    sendEmailRpc(user.getEmail());
    return user.getUserId();
}

There may be pitfalls in this implementation. RPC remote calls are nested in the transaction, that is, some non-DB operations are nested in the transaction. If these non-DB operations take a long time, large transaction problems may occur.

The so-called big transaction problem is that it is a long-running transaction. Because the transaction is not committed consistently, the database connection will be occupied, that is, in the concurrent scenario, the database connection pool is full, affecting other requests to access the database and affecting the performance of other interfaces.

The problems caused by large transactions mainly include: interface timeout, deadlock, master-slave delay, etc. Therefore, in order to optimize the interface, we have to avoid the problem of large transactions. We can circumvent large transactions through these solutions:

  • Do not put RPC remote calls into transactions
  • Some query-related operations should be placed outside the transaction as much as possible
  • Avoid processing too much data in a transaction

13. Deep pagination problem

In the past, the company analyzed several problems that the interface took a long time, and the final conclusion was because of the deep paging problem.

Deep paging problem, why is it slow? Let's look at this SQL

select id,name,balance from account where create_time> '2020-09-19' limit 100000,10;

limit 100000,10 means that 100010 rows will be scanned, the first 100000 rows will be discarded, and 10 rows will be returned at the end. Even create_time will return to the table many times.

We can optimize the deep pagination problem through the label record method and the delayed association method.

13.1 Label recording method

It is to mark which item was queried last time, and when you check next time, start scanning down from this item. It's like reading a book, you can fold it or place a bookmark where you saw it last time, and when you read it next time, you can just turn it over.

Assuming that the last record reached 100000, the SQL can be modified as:

select  id,name,balance FROM account where id > 100000 limit 10;

In this case, no matter how many pages are turned later, the performance will be good, because the id primary key index is hit. But this method has limitations: a field similar to continuous self-increment is required.

13.2 Delayed correlation method

The delayed association method is to transfer the condition to the primary key index tree, and then reduce it back to the table. The optimized SQL is as follows:

select  acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.create_time > '2020-09-19' limit 100000, 10) AS acct2 on acct1.id= acct2.id;

The optimization idea is to first query the primary key ID that meets the conditions through the idx_create_time secondary index tree, and then connect to the original table through the primary key ID, so that the primary key index is directly used later, and the table return is also reduced.

14. Optimize program structure

Optimizing program logic and program code can save time. For example, your program creates unnecessary objects, or the program logic is chaotic, repeatedly checks the database, or your implementation logic algorithm is not the most efficient, and so on.

Let me give a simple example: complex logic conditions, sometimes adjusting the order can make your program more efficient.

Suppose the business requirement is as follows: If the user is a member, he needs to send a thank you message when he logs in for the first time. If you haven't thought about it, the code is directly written like this

if(isUserVip && isFirstLogin){
    
    
    sendSmsMsg();
}

Suppose there are 5 requests, isUserVip judges that 3 requests are passed, and isFirstLogin only 1 request is passed. Then the above code, isUserVip is executed 5 times, and isFirstLogin is executed 3 times, as follows:
insert image description hereIf you adjust the order of isUserVip and isFirstLogin:

if(isFirstLogin && isUserVip ){
    
    
    sendMsg();
}

isFirstLogin is executed 5 times, and isUserVip is executed 1 time:
insert image description here

15. Compress transmission content

The transmission content is compressed, and the transmission message becomes smaller, so the transmission will be faster. 10M bandwidth, the transmission of 10k packets is generally faster than the transmission of 1M.

To use a metaphor, can a thousand-mile horse run faster with a load of 100 catties, or with a load of 10 catties?

Another example of a video site:

If you don't do any compression encoding on the video, because the bandwidth is limited. The time-consuming transmission of a huge amount of data on the network will be many times slower than that after encoding and compression.

16. Massive data processing, consider NoSQL

I have seen several slow SQL before, all of which are related to the problem of deep paging. It is found that the effect of label recording method and delayed association method is not very obvious. The reason is that statistics and fuzzy search are required, and the statistical data is really large. Finally, align the plan with the team leader, and then synchronize the data to Elasticsearch, and then use Elasticsearch to query these fuzzy search requirements.

What I want to express is that if the amount of data is too large and must be stored in a relational database, it can be divided into databases and tables. But sometimes, we can also use NoSQL, such as Elasticsearch, Hbase, etc.

17. Thread pool design should be reasonable

We use the thread pool to allow tasks to be processed in parallel to complete tasks more efficiently. But sometimes, if the thread pool design is not reasonable, the interface execution efficiency is not ideal.

Generally, we need to pay attention to these parameters of the thread pool: core threads, maximum number of threads, and blocking queues.

  • If the core threads are too small, good parallel effects cannot be achieved.
  • If the blocking queue is unreasonable, it is not only a blocking problem, but may even cause OOM
  • If the thread pool does not distinguish between business isolation, the core business may be dragged down by the edge business.

18. Machine problems (fullGC, full threads, too many IO resources not closed, etc.).

Sometimes, our interface is slow, it is a machine processing problem. There are mainly fullGC, full threads, too many IO resources are not closed, and so on.

I checked a fullGC problem before: when the operation lady exported more than 600,000 excel, she said that it was stuck, and then we received a monitoring alarm. After checking later, we found that our old code is excel generated by Apache POI. When the amount of exported excel data is large, the JVM memory is tight at that time and it will be directly Full GC.

If the thread is full, it will also cause the interface to wait. so. If it is a high-concurrency scenario, we need to access the current limit and reject redundant requests.

If the IO resources are not closed, it will also lead to increased time consumption. You can take a look at this, usually your computer has been opening a lot of files, does it feel very stuck.

source

Combat summary! Summary of 18 interface optimization schemes
Three strategies for website performance optimization
Front-end performance optimization in practice

Guess you like

Origin blog.csdn.net/weixin_44231544/article/details/122248694