High-performance Nginx HTTPS tuning-how to speed up HTTPS by 30%

Why optimize Ngin HTTPS delay
Nginx is often the most common server, often used as load balancer (Load Balancer), reverse proxy (Reverse Proxy), and gateway (Gateway) and so on. A properly configured single Nginx server should be able to ** expect to withstand 50K to 80K requests per second[1]** while keeping the CPU load within a controllable range.

But in many cases, load is not the focus of primary optimization. For example, for Kara search, we hope that users can experience the feeling of instant search every time they press a key. That is, each search request must be returned to the user end-to-end within 100ms-200ms. Let users search without "stuck" and "loading". Therefore, for us, optimizing request latency is the most important optimization direction.

In this article, we first introduce what the TLS settings in Nginx may be related to request latency, and how to adjust them to maximize acceleration. Then we use the optimized Kara search[2] Nginx server example to share how to adjust the Nginx TLS/SSL settings to speed up users who search for the first time by about 30%. We will discuss in detail what optimization we have done at each step, the motivation and effect of optimization. I hope I can help other students who encounter similar problems.

As usual, the Nginx setting file of this article is placed on github, and you are welcome to use it directly: High-performance Nginx HTTPS tuning [3]

TLS handshake and delay

Many times developers will think: If you are not absolutely concerned about performance, then understanding the underlying and more detailed optimization is not necessary. This sentence is appropriate in many cases, because complex low-level logic must be wrapped up in order to make the complexity of higher-level application development controllable. For example, if you only need to develop an APP or website, it may not be necessary to pay attention to the assembly details and how the compiler optimizes your code-after all, many optimizations on Apple or Android are done at the bottom.

So, what is the relationship between understanding the underlying TLS and Nginx latency optimization at the application layer?

The answer is that in most cases, optimizing network latency is actually trying to reduce the number of data transmissions between the user and the server, which is the so-called roundtrip. Due to physical limitations, the speed of light propagation from Beijing to Yunnan is almost 20 milliseconds. If you accidentally make the data travel between Beijing and Yunnan multiple times, the delay will inevitably go up.

Therefore, if you need to optimize request latency, it is helpful to know a little about the context of the underlying network. In many cases, it is even whether you can easily understand the key to an optimization. In this article, we will not discuss too much details of TCP or TLS mechanism. If you are interested, please refer to the book High Performance Browser Networking[4] , which can be read for free.

For example, the following figure shows the data transmission before starting to transmit any data if HTTPS is enabled for your service.
Insert picture description here As you can see, before your user gets the data he needs, the underlying data package has ran 3 back and forth between the user and your server.

Assuming that it takes 28 milliseconds for each round trip, the user has waited 224 milliseconds before receiving data.

At the same time, this 28 millisecond is actually a very optimistic assumption. Under domestic telecommunications, China Unicom, China Mobile, and various complex network conditions, the delay between the user and the server is even more uncontrollable. On the other hand, usually a web page requires dozens of requests, and these requests may not be all parallel. Therefore, if the page is opened in a few seconds after tens of 224 milliseconds.

Therefore, in principle, if possible, we need to minimize the roundtrip between the user and the server. In the following settings, for each setting, we will discuss why this setting may help reduce the roundtrip.

TLS settings in Nginx

So in the Nginx settings, how to adjust the parameters to reduce the delay?
Enabling the HTTP/2
HTTP/2 standard is an improvement from Google's SPDY. Compared with HTTP 1.1, it has improved a lot of performance, especially when multiple requests in parallel are required to significantly reduce the delay. On the current network, a webpage needs to be requested dozens of times on average. In the HTTP 1.1 era, what browsers can do is to open several more connections (usually 6) for parallel requests, while in HTTP 2 it can be in one connection. Make parallel requests. HTTP 2 natively supports multiple parallel requests, so it greatly reduces the round trip of sequentially executed requests, and you can first consider turning it on.

If you want to see for yourself the speed difference between HTTP 1.1 and HTTP 2.0, you can try: https://www.httpvshttps.com/. In my network test, HTTP/2 is 66% faster than HTTP 1.1.
Insert picture description here
Enabling HTTP 2.0 in Nginx is very simple, just add a http2 logo

listen 443 ssl;
#改为
listen 443 ssl http2;

If you are worried that your users are using an old client, such as Python requests, which does not support HTTP 2 for the time being, then don't worry. If the user's client does not support HTTP 2, the connection will automatically be downgraded to HTTP 1.1, maintaining backward compatibility. Therefore, all users who use the old Client are still unaffected, while the new client can enjoy the new features of HTTP/2.

How to make sure your website or API is enabled with HTTP 2

Open the developer tools in Chrome, click on Protocol , and you can see the protocol used in all requests. If the value of the protocol column is h2 , then HTTP 2 is used.
Insert picture description here
Of course, another way is to use curl directly. If there is HTTP/2 before the returned status, then HTTP/2 is naturally enabled.

➜  ~ curl --http2 -I https://kalasearch.cn
HTTP/2 403
server: Tengine
content-type: application/xml
content-length: 264
date: Tue, 22 Dec 2020 18:38:46 GMT
x-oss-request-id: 5FE23D363ADDB93430197043
x-oss-cdn-auth: success
x-oss-server-time: 0
x-alicdn-da-ups-status: endOs,0,403
via: cache13.l2et2[148,0], cache10.l2ot7[291,0], cache4.us13[360,0]
timing-allow-origin: *
eagleid: 2ff6169816086623266688093e

Adjust the priority
of Cipher Try to choose the newer and faster Cipher, which helps to reduce the delay [5]:

# 手动启用 cipher 列表
ssl_prefer_server_ciphers on;  # prefer a list of ciphers to prevent old and slow ciphers
ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';

Enabling OCSP Stapling
in China may be the delay optimization that has the greatest impact on services or websites that use Let's Encrypt certificates. If you do not enable OCSP Stapling, when users connect to your server, sometimes you need to verify the certificate. And because of some unknown reasons (I won’t tell you the truth) Let's Encrypt's authentication server is not very smooth[6] , so it can cause a delay of several seconds or even ten seconds [7] . This problem is on iOS devices. Especially serious

There are two ways to solve this problem:

Do not use Let's Encrypt, you can try to replace it with the free DV certificate provided by Alibaba Cloud
Turn on OCSP Stapling. If OCSP Stapling is
turned on, you can skip the certificate verification step. Eliminating a roundtrip, especially roundtrips whose network conditions are uncontrollable, may greatly reduce your latency.

Enabling OCSP Stapling in Nginx is also very simple, just need to set:

ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /path/to/full_chain.pem;

How to check if OCSP Stapling is turned on?
You can use the following command

openssl s_client -connect test.kalasearch.cn:443 -servername kalasearch.cn -status -tlsextdebug < /dev/null 2>&1 | grep -i "OCSP response"

To test. If the result is

OCSP response:
OCSP Response Data:
    OCSP Response Status: successful (0x0)
    Response Type: Basic OCSP Response

It shows that it has been turned on. Refer to the article on the problem that HTTPS is slow on the iPhone [8] .
Adjust ssl_buffer_size
ssl_buffer_size controls the buffer size when sending data, the default setting is 16k. The smaller the value, the smaller the delay, and the added headers and the like will make the overhead larger, and vice versa, the larger the delay, the smaller the overhead.

Therefore, if your service is a REST API[9] or a website, lowering this value can reduce latency and TTFB, but if your server is used to transfer large files, it can maintain 16k. For the discussion of this value and the more general TLS Record Size discussion, you can refer to: Best value for nginx's ssl buffer size option[10]

If it is a website or REST API, the recommended value is 4k, but the best value of this value will obviously vary depending on the data, so please try a different value between 2-16k. It is also very easy to adjust this value in Nginx

ssl_buffer_size 4k;

Enabling SSL Session Cache
Enabling SSL Session Cache can greatly reduce the repeated verification of TLS and reduce the roundtrip of TLS handshake. Although the session cache will take up a certain amount of memory, it can cache 4000 connections with 1M of memory, which can be said to be very, very cost-effective. At the same time, for most websites and services, reaching 4000 simultaneous connections requires a very, very large user base, so you can open it with confidence.

Here ssl_session_cache is set to use 50M memory, and 4 hours of connection timeout closing time ssl_session_timeout

# Enable SSL cache to speed up for return visitors
ssl_session_cache   shared:SSL:50m; # speed up first time. 1m ~= 4000 connections
ssl_session_timeout 4h;

How to reduce request delay by 30% in
Kara search Kara Search is a domestic Algolia[11] dedicated to helping developers quickly build instant search functions, making it the fastest and easiest to use search-as-a-service in China.

After the developer accesses, all search requests can be directly returned to the end user through the Kara API. In order for users to have an instant search experience, we need to return the results to the user within a very short time (usually 100ms to 200ms) after each keystroke. Therefore, each search requires engine processing time within 50 milliseconds and end-to-end time within 200 milliseconds.

We used the data of Douban Movies to make a movie search demo. If you are interested, please experience the instant search. Try searching for "Infernal Affairs" or "Westward Journey" to experience the speed and relevance: https://movies-demo .kalasearch.cn/

For each request, there is only a delay budget of 100 to 200 milliseconds, and we must take the delay of each step into account.
To simplify, the
Insert picture description here
total delay required for each search request is total delay = user request arrives at the server (T1) + anti-generation processing (Nginx T2) + data center delay (T3) + server processing (Kara engine T4) + user request return ( T3+T1)

Among the above delays, T1 is only related to the physical distance between the user and the server, while T3 is very small (refer to Jeff Dean Number [12] ) and can be ignored.

So we can control roughly only T2 and T4, that is, the processing time of the Nginx server and the processing time of Kara's engine.

Nginx is here as a reverse proxy, handling some security, flow control, and TLS logic, while Kara's engine is an inverted engine based on Lucene.

The first possibility we consider first is: Does the delay come from the Kara engine?

In the **Grafana Dashboard[13]** shown in the figure below, we see that in addition to a few slow queries from time to time, 95% of the search servers have a processing delay of less than 20 milliseconds. Compared with the benchmark Elastic Search engine on the same data set, the P95 search latency is about 200 milliseconds, so the possibility of slow engine speed is ruled out.
Insert picture description here
In Alibaba Cloud Monitoring, we set up to send search requests from all over the country to the Kara server. We finally found that the SSL processing time often exceeds 300 milliseconds. That is to say, in the T2 step, Nginx has used up all our request time budget for just processing things like TLS handshake.

After checking at the same time, we found that the search speed on Apple devices is extremely slow, especially the device that is accessed for the first time. Therefore, we roughly judged that it should be because of the Let's Encrypt certificate that we use.

We adjusted the Nginx settings according to the above steps, and summarized the steps to write this article. After adjusting the settings of Nginx TLS, the SSL time has been reduced from an average of 140ms to around 110ms (all provinces of China Unicom and mobile test sites), and the problem of slow first access on Apple devices has also disappeared.
Insert picture description here
After the adjustment, the search delay of the nationwide test was reduced to about 150 milliseconds.

Summary
Adjusting the TLS settings in Nginx has a very large impact on the service and website delays that use HTTPS. This article summarizes the TLS-related settings in Nginx, discusses in detail the possible impact of each setting on the delay, and gives adjustment suggestions. Later, we will continue to discuss the specific improvements of HTTP/2 compared to HTTP 1.x, as well as the advantages and disadvantages of using HTTP/2 in the REST API, please continue to pay attention.

High-performance Nginx HTTPS tuning-how to speed up HTTPS by 30%

Guess you like