Nginx production environment configuration, elasticsearch production environment configuration, rocketmq production environment configuration (the most complete in history)

Nginx achieves 100,000+ concurrency

When optimizing the kernel, there are many things that can be done. However, we usually make adjustments according to business characteristics. When Nginx is used as a static web content server, a reverse proxy, or a server that provides compression servers, the adjustment of kernel parameters is different. of,

Overview:

Since the default linux kernel parameters consider the most common scenario, which obviously does not meet the definition of a web server used to support high concurrent access, so it is necessary to modify the linux kernel parameters so that Nginx can have higher performance;

Note: This article is continuously updated in PDF. For the latest PDF files of Nien’s architecture notes and interview questions, please get them from the link below: Code Cloud

Refer to key Linux kernel optimization parameters

/etc/sysctl.conf

Modify /etc/sysctl.confto change kernel parameters

Modify the configuration file and execute sysctl -pthe command to make the configuration take effect immediately

fs.file-max = 2024000
fs.nr_open = 1024000

net.ipv4.tcp_tw_reuse = 1

ner.ipv4.tcp_keepalive_time = 600

net.ipv4.tcp_fin_timeout = 30

net.ipv4.tcp_max_tw_buckets = 5000

net.ipv4.ip_local_port_range = 1024 65000

net.ipv4.tcp_rmem = 10240 87380 12582912

net.ipv4.tcp_wmem = 10240 87380 12582912

net.core.netdev_max_backlog = 8096

net.core.rmem_default = 6291456

net.core.wmem_default = 6291456

net.core.rmem_max = 12582912

net.core.wmem_max = 12582912

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

net.ipv4.tcp_tw_recycle = 1

net.core.somaxconn=262114

net.ipv4.tcp_max_orphans=262114

For Nginx to support ultra-high throughput, what needs to be optimized is mainly the number of file handles and TCP network parameters:

The maximum number of handles that the system can open

fs.file-max = 2024000

Reuse sockets in TIME_WAIT state for new TCP connections

net.ipv4.tcp_tw_reuse = 1

#参数设置为 1 ,表示允许将TIME_WAIT状态的socket重新用于新的TCP链接,这对于服务器来说意义重大,因为总有大量TIME_WAIT状态的链接存在;

How often TCP sends keepalive messages

ner.ipv4.tcp_keepalive_time = 600
#当keepalive启动时,TCP发送keepalive消息的频度;
默认是2小时,将其设置为10分钟,可以更快的清理无效链接。

The maximum time the socket remains in the FIN_WAIT_2 state

net.ipv4.tcp_fin_timeout = 30 
#当服务器主动关闭链接时,socket保持在FIN_WAIT_2状态的最大时间

Maximum number of TIME_WAIT sockets allowed

net.ipv4.tcp_max_tw_buckets = 5000
# 这个参数表示操作系统允许TIME_WAIT套接字数量的最大值,
# 如果超过这个数字,TIME_WAIT套接字将立刻被清除并打印警告信息。
# 该参数默认为180000,过多的TIME_WAIT套接字会使Web服务器变慢。

The value range of the local port

net.ipv4.ip_local_port_range = 1024 65000 
#定义UDP和TCP链接的本地端口的取值范围。

Each Socket is mapped as a file in Linux and is associated with two buffers (read buffer and write buffer) in the kernel. In other words, each Socket has two kernel buffers. Configure through the following four options

  • rmem_default: When a Socket is created, the default read buffer size, in bytes;
  • wmem_default: When a Socket is created, the default write buffer size, in bytes;
  • rmem_max: the maximum value of a Socket's read buffer that can be set by the program, in bytes;
  • wmem_max: the maximum value of a Socket write buffer that can be set by the program, in bytes;
net.core.rmem_default = 6291456 
#表示内核套接字接受缓存区默认大小。
net.core.wmem_default = 6291456 
#表示内核套接字发送缓存区默认大小。
net.core.rmem_max = 12582912 
#表示内核套接字接受缓存区最大大小。
net.core.wmem_max = 12582912 
#表示内核套接字发送缓存区最大大小。

Note: The above four parameters need to be considered comprehensively based on business logic and actual hardware costs;

There are two more parameters, tcp_rmem, tcp_wmem. The size of read and write buffer memory allocated for each TCP connection, in Byte

tcp_rmem accepts the minimum, default, and maximum values ​​of the cache

net.ipv4.tcp_rmem = 10240 87380 12582912 
#定义了TCP接受缓存的最小值、默认值、最大值。

The first number indicates the minimum value of the buffer, the minimum memory allocated for the TCP connection, the default is pagesize (4K bytes), the lower limit of the receiving window size of each socket;

The second number indicates the default value of the buffer, which is the default memory allocated for the TCP connection. The default value is 16K, which is the size of the receiving window. The so-called window size is just a limited value, and the actual corresponding memory buffer is allocated by the protocol stack management. ;

The third number indicates the maximum value of the buffer, the maximum memory allocated for the TCP connection, the upper limit of the receive window size of each socket link, and the upper limit for the tcp protocol stack to automatically adjust the window size.

Note: The actual corresponding memory buffer is allocated by the protocol stack management

Generally, it is allocated according to the default value. The latest example above is 10KB, and the default is 86KB.

The minimum, default, and maximum values ​​of the sending cache.

net.ipv4.tcp_wmem = 10240 87380 12582912 
#定义TCP发送缓存的最小值、默认值、最大值。

The above are the settings of the read and write buffer of the TCP socket, each of which has three values:

  • The first value is the buffer minimum
  • The intermediate value is the default value of the buffer
  • The last one is the maximum value of the buffer

The difference between the beginning of net.core and net.ipv4

The configuration at the beginning of net.core is the general configuration of the network layer, and the configuration at the beginning of net.ipv4 is the network configuration for ipv4.

Although the value of the ipv4 buffer is not limited by the value of the core buffer, the maximum value of the buffer is still limited by the maximum value of the core.

net.core.netdev_max_backlog = 8096 
#当网卡接收数据包的速度大于内核处理速度时,会有一个列队保存这些数据包。这个参数表示该列队的最大值。

The size of the TCP socket buffer is controlled by itself rather than by the core kernel buffer.

The model of TCP send buffer and receive buffer is as follows:

image

Client creates a TCP socket and sets its sending buffer size to 4096 bytes through the SO_SNDBUF option. After connecting to the Server, it sends a message with a TCP data segment length of 1024 every 1 second. The server does not call recv(). The expected results are divided into the following stages:

Phase 1 The socket receiving buffer on the server side is not full, so although the server will not recv(), it can still reply ACK to the message sent by the client;

Phase 2 The socket receiving buffer on the server side is filled, and the zero window (Zero Window) is notified to the client side. The data to be sent by the client starts to accumulate in the send buffer of the socket;

Phase 3 The send buffer of the socket on the client side is full, and the user process is blocked on send().

Used to solve SYN attacks on TCP

net.ipv4.tcp_syncookies = 1
#与性能无关。用于解决TCP的SYN攻击。

Accept the maximum length of the SYN request queue

net.ipv4.tcp_max_syn_backlog = 8192
#这个参数表示TCP三次握手建立阶段接受SYN请求列队的最大长度,默认1024,
# 将其设置的大一些
# 可以使出现Nginx繁忙来不及accept新连接的情况时,Linux不至于丢失客户端发起的链接请求。

Enable timewait fast recovery

net.ipv4.tcp_tw_recycle = 1 
#这个参数用于设置启用timewait快速回收。

Adjust the number of TCP connections initiated by the system at the same time

net.core.somaxconn=262114 
# 选项默认值是128,
# 这个参数用于调节系统同时发起的TCP连接数,
# 在高并发的请求中,默认的值可能会导致链接超时或者重传,因此需要结合高并发请求数来调节此值。

Protection against simple DOS attacks

net.ipv4.tcp_max_orphans=262114 
# 选项用于设定系统中最多有多少个TCP套接字不被关联到任何一个用户文件句柄上。
# 如果超过这个数字,孤立链接将立即被复位并输出警告信息。
# 这个限制只是为了防止简单的DOS攻击,
# 不用过分依靠这个限制甚至认为的减小这个值,更多的情况是增加这个值。

Configuration of the number of file handles for a single process

/etc/security/limits.conf

/etc/security/limits.conf

* soft nofile 1024000
* hard nofile 1024000
* soft nproc 655360
* hard nproc 655360
* soft stack unlimited
* hard stack unlimited
* soft   memlock    unlimited
* hard   memlock    unlimited

Maximum number of open file descriptors for a process

* soft nofile 1000000
* hard nofile 1000000
* soft nproc 655360
* hard nproc 655360

# *代表针对所有用户
# nproc 是代表最大进程数
# nofile 是代表最大文件打开数

The difference between hard and soft: In terms of settings, soft is usually smaller than hard,

For example, soft can be set to 80, and hard is set to 100, then you can use it up to 90 (not exceeding 100), but when it is between 80 and 100, the system will notify you with a warning message.

In short:

a. The number of file descriptors opened by all processes cannot exceed/proc/sys/fs/file-max

b. The number of file descriptors opened by a single process cannot exceed the soft limit of nofile in the user limit

c. The soft limit of nofile cannot exceed its hard limit

d. The hard limit of nofile cannot exceed/proc/sys/fs/nr_open

RocketMQ production environment configuration

Reference broker configuration

The cluster architecture is asynchronous disk brushing and synchronous replication

#请修改
brokerClusterName=XXXCluster
brokerName=broker-a
brokerId=0
listenPort=10911
#请修改
namesrvAddr=x.x.x.x:9876;x.x.x.x::9876
defaultTopicQueueNums=4
autoCreateTopicEnable=false
autoCreateSubscriptionGroup=false
deleteWhen=04
fileReservedTime=48
mapedFileSizeCommitLog=1073741824
mapedFileSizeConsumeQueue=50000000
destroyMapedFileIntervalForcibly=120000
redeleteHangedFileInterval=120000
diskMaxUsedSpaceRatio=88
#存储路径
storePathRootDir=/data/rocketmq/store
#commitLog存储路径
storePathCommitLog=/data/rocketmq/store/commitlog
#消费队列存储路径
storePathConsumeQueue=/data/rocketmq/store/consumequeue
# 消息索引存储路径
storePathIndex=/data/rocketmq/store/index
# checkpoint 文件存储路径
storeCheckpoint=/data/rocketmq/store/checkpoint
#abort 文件存储路径
abortFile=/data/rocketmq/store/abort
maxMessageSize=65536
flushCommitLogLeastPages=4
flushConsumeQueueLeastPages=2
flushCommitLogThoroughInterval=10000
flushConsumeQueueThoroughInterval=60000
brokerRole=SYNC_MASTER
flushDiskType=ASYNC_FLUSH
checkTransactionMessageEnable=false
maxTransferCountOnMessageInMemory=1000
transientStorePoolEnable=true
warmMapedFileEnable=true
pullMessageThreadPoolNums=128
slaveReadEnable=true
transferMsgByHeap=false
waitTimeMillsInSendQueue=1000

ElasticSearch production environment configuration

Reference configuration file

/etc/sysctl.conf

fs.file-max = 2024000
fs.nr_open = 1024000

net.ipv4.tcp_tw_reuse = 1

ner.ipv4.tcp_keepalive_time = 600

net.ipv4.tcp_fin_timeout = 30

net.ipv4.tcp_max_tw_buckets = 5000

net.ipv4.ip_local_port_range = 1024 65000

net.ipv4.tcp_rmem = 10240 87380 12582912

net.ipv4.tcp_wmem = 10240 87380 12582912

net.core.netdev_max_backlog = 8096

net.core.rmem_default = 6291456

net.core.wmem_default = 6291456

net.core.rmem_max = 12582912

net.core.wmem_max = 12582912

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_syn_backlog = 8192

net.ipv4.tcp_tw_recycle = 1

net.core.somaxconn=262114

net.ipv4.tcp_max_orphans=262114
net.ipv4.tcp_retries2 = 5
vm.max_map_count = 262144

operating system

larger file descriptor

Lucene uses a very large number of files. And Elasticsearch uses a lot of sockets to communicate between nodes and HTTP clients.

All of these require available file descriptors.

Sadly, many modern Linux distributions allow a disallowed 1024 file descriptors per process.

This is too low for a small Elasticsearch node, let alone one handling hundreds of indexes.

Set MMap

Elasticsearch also uses a mix of NioFS and MMapFS for various files.

Make sure to configure the maximum map count so that enough virtual memory is available for mmapped files.

Elasticsearch uses a mappfs directory by default to store indexes. The default operating system's limit on mmap counts may be too low, which may cause out-of-memory exceptions.

set temporarilysysctl -w vm.max_map_count=262144

permanent setting

$ vim /etc/sysctl.conf

# 设置操作系统mmap数限制,Elasticsearch与Lucene使用mmap来映射部分索引到Elasticsearch的地址空间
# 为了让mmap生效,Elasticsearch还需要有创建需要内存映射区的能力。最大map数检查是确保内核允许创建至少262144个内存映射区
vm.max_map_count = 262144

JVM virtual machine

Unless otherwise stated on the Elasticsearch website, you should always be running the latest version of the Java Virtual Machine (JVM).

Both Elasticsearch and Lucene are relatively demanding software.

Lucene's unit and integration tests often expose bugs in the JVM itself.

Number of threads (number of threads or processes)

Elasticsearch uses many thread pools for different types of operations. It's important to be able to create new threads when needed. Make sure that the number of threads that Elasticsearch users can create is at least 4096.

$ vim /etc/security/limits.conf

elasticsearch soft nproc 4096
elasticsearch hard nproc 4096

Leave half of the memory space for lucene

A common problem is configuring a heap that is too large. You have a 64GB machine and you want to give Elasticsearch all 64GB of memory.

More is better? ! The heap is absolutely critical to Elasticsearch, it is used by many in-memory data structures to provide fast operations.

But there is another major memory user is Lucene. Lucene is designed to leverage the underlying operating system to cache data structures in memory.

Lucene's segments are stored in separate files, and because segments are immutable, these files never change. This makes them very cache-friendly, and the underlying operating system will suitably keep the segments resident in memory for faster access.

These segments include inverted indexes (for full-text search) and docvalues ​​(for aggregation).

Lucene's performance depends on this interaction with the operating system. But if you give Elasticsearch's heap all available memory, Lucene won't have any memory left.

This can seriously affect performance.

The standard recommendation is to give the Elasticsearch heap 50% free memory while leaving the other 50% free.

It won't go unused; Lucene will happily gobble up whatever is left.

If you're not aggregating string fields in the analysis (eg you don't need fielddata), you might consider lowering the heap even more. The smaller the heap you can make, the smaller you can expect to get from

Elasticsearch (faster GC) and Lucene (more memory cache) better performance.

Do not exceed 32G

It turns out that the HotSpot JVM uses a trick to compress object pointers when the heap size is less than 32GB.

It can be -XX:+PrintFlagsFinalchecked by es2.2.0, no need to set after es2.2.0, it will print compressed ordinary object pointers [true] after startup

In Java, all objects are allocated on the heap and referenced by pointers.

Ordinary object pointers (OOP) point to these objects, and are usually the CPU-native word size: 32-bit or 64-bit, depending on the processor.

The exact byte position of the pointer reference value. For 32-bit systems, this maximum heap size is 4GB.

For 64-bit systems, the heap size can get larger, but the overhead of 64-bit pointers means more wasted space because the pointers are bigger. And worse than wasted space, larger pointers take up more bandwidth when moving values ​​between main memory and various caches (LLC, L1, etc.).

Java uses a trick called compress oops to solve this problem. Instead of pointing to exact byte locations in memory, pointers refer to object offsets. This means that a 32-bit pointer can refer to four billion objects instead of four billion bytes.

Ultimately, this means that the heap can grow to a physical size of about 32 GB while still using 32-bit pointers.

Once you cross the 32GB boundary, the pointers switch back to Ordinary object pointers .

The size of each pointer increases, using more CPU memory bandwidth, and you effectively lose memory.

In fact, it takes until about 40 to 50GB of allocated heap, and you have the same effective memory of a heap just under 32GB using compressed oops. So even if you have RAM, try to avoid crossing the 32 GB heap boundary. It wastes memory, slows down CPU performance, and makes the GC contend with large heaps.

Swapping is the Achilles heel of performance

It should be obvious, but it's clearly spelled out: Swapping main memory to disk kills server performance. In-memory operations are operations that need to be performed quickly. A 100 microsecond operation will take 10 milliseconds if the memory is swapped to disk. Now repeat all other 10us operations with increased latency. It's not hard to see why swapping is terrible for performance.

1. Your best bet is to completely disable swap on your system. This can be done temporarily:

sudo swapoff -a

To disable permanently you need to edit /etc/fstab. Consult your operating system's documentation.

2. If disabling swap completely is not an option, you can try

sysctl vm.swappiness = 1(see cat /proc/sys/vm/swappiness)

This setting controls how aggressively the operating system tries to swap memory. to prevent swapping under normal circumstances, but still allow the OS to swap in emergency situations. A swappiness value of 1 is better than 0, because on some kernel versions, a swappiness of 0 can invoke the OOM-killer.

3. Finally, if neither method is possible, mlockall should be enabled. This allows the JVM to lock its memory and prevent it from being swapped by the OS. It can be set in elasticsearch.yml:

bootstrap.mlockall: true

TCP retransmission timeout (TCP retransmission timeout)

Each pair of nodes in the cluster communicates over many TCP connections that remain open until one of the nodes goes down or communication between the nodes is disrupted due to a failure in the underlying infrastructure.

TCP provides reliable communication over occasionally unreliable networks by hiding temporary network interruptions from communicating applications. Your operating system will retransmit any lost messages several times before notifying the sender of any problems. Most Linux distributions retransmit any lost packets by default 15 times. Retransmissions are exponentially slower, so those 15 retransmissions take 900 seconds to complete. This means that Linux can take many minutes to detect network partitions or failed nodes using this method. Windows only has 5 retransmissions by default, which is equivalent to a timeout of about 6 seconds.

Linux defaults to allowing communication on networks that may experience packet loss for a long time, but this default is too large for a production network within a single data center, like most Elasticsearch clusters. Highly available clusters must be able to detect node failures quickly so that they can react quickly by reallocating lost shards, rerouting searches, and possibly electing a new master node. Therefore, Linux users should reduce the maximum number of TCP retransmissions.

$ vim /etc/sysctl.conf

net.ipv4.tcp_retries2 = 5

config/ jvm.options

  • It is very important that Elasticsearch has enough free heap.
  • The minimum value of the heap (Xms) and the maximum value of the heap (Xmx) are set to be the same.
  • The larger the available heap for Elasticsearch, the more data it can cache in memory. But it should be noted that the larger the heap, the longer the pause caused by garbage collection will be.
  • Set Xmx not to be greater than 50% of physical memory. Used to ensure that enough physical memory is reserved for the operating system cache.
  • Disable running Elasticsearch with the serial collector (-XX:+UseSerialGC), the default JVM configuration via Elasticsearch configuration will use the CMS collector.
-Xms32g
-Xmx32g

hardware

Memory

The first and most important resource is memory. Both sorting and aggregation can lead to memory starvation, so it is important to have enough heap space to accommodate these.

Even with a small heap, provide additional memory for the operating system cache, because many data structures used by Lucene are in a disk-based format, and Elasticsearch's use of the operating system cache has a large impact.

A machine with 64GB of RAM is ideal, but 32GB and 16GB machines are also common.

Less than 8GB is often counterproductive (you end up needing many, many small machines), and more than 64GB can be problematic, as we'll discuss in Heap: Size and Swap.

CPU

Most Elasticsearch deployments tend to be light on CPU. Therefore, the exact processor settings are more important than other resources, and a modern processor with multiple cores should be chosen. General-purpose clusters use machines with 2 to 8 cores.

If you need to choose between a faster CPU or more cores, choose More Cores. The additional concurrency provided by multiple cores will far outweigh a slightly faster clock speed.

hard disk

Disks are important for all clusters, but especially for index-heavy clusters (such as disks that ingest log data). The disk is the slowest subsystem in the server, which means that a write-heavy cluster can easily saturate its disk, which in turn becomes the bottleneck of the cluster.

If you can afford SSDs, they are far superior to any spinning disks. SSD-enabled nodes see improvements in query and indexing performance.

If using spinning disks, try to get the fastest possible disks (high performance server disks, 15k RPM drives).

Using RAID 0 is an effective way to increase disk speed and works with both spinning disks and SSDs. There is no need to use the mirrored or parity variants of RAID, as high availability is built into Elasticsearch through replicas.

Finally, avoid network-attached storage (NAS). NAS is usually slower, shows larger latencies, has a larger deviation in average latencies, and is a single point of failure.

network

Fast and reliable networking is obviously important to performance in distributed systems. Low latency helps ensure nodes can communicate easily, while high bandwidth facilitates segment movement and recovery. Modern data center networking (1GbE, 10GbE) is sufficient for the vast majority of clusters.

Avoid clusters that span multiple data centers, even if the data centers are located in close proximity. Definitely avoid clusters that span large geographic distances.

Elasticsearch clusters assume all nodes are equal, not half the nodes are 150ms away from another data center. Larger latencies tend to exacerbate problems in distributed systems and make debugging and resolution more difficult.

Similar to NAS parameters, everyone claims that the pipeline between data centers is robust and low latency. (boast). From our experience, the hassle of managing clusters across data centers is a waste of money.

other configuration

Really huge machines with hundreds of gigabytes of RAM and dozens of CPU cores are now available. Alternatively, thousands of small virtual machines can be launched in cloud platforms such as EC2. Which method is best?

In general, it is best to choose a medium to large box. Avoid small machines because you don't want to manage a cluster with a thousand nodes, and the overhead of simply running Elasticsearch is more noticeable on such small machines.

Also, avoid really huge machines. They often lead to unbalanced resource usage (eg, all memory is being used, but no CPU), and can add operational complexity later on if you have to run multiple nodes per machine.

The realization path of technical freedom:

Realize your architectural freedom:

" Have a thorough understanding of the 8-figure-1 template, everyone can do the architecture "

" 10Wqps review platform, how to structure it? This is what station B does! ! ! "

" Alibaba Two Sides: How to optimize the performance of tens of millions and billions of data?" Textbook-level answers are coming "

" Peak 21WQps, 100 million DAU, how is the small game "Sheep a Sheep" structured? "

" How to Scheduling 10 Billion-Level Orders, Come to a Big Factory's Superb Solution "

" Two Big Factory 10 Billion-Level Red Envelope Architecture Scheme "

… more architecture articles, being added

Realize your responsive freedom:

" Responsive Bible: 10W Words, Realize Spring Responsive Programming Freedom "

This is the old version of " Flux, Mono, Reactor Combat (the most complete in history) "

Realize your spring cloud freedom:

" Spring Cloud Alibaba Study Bible " PDF

" Sharding-JDBC underlying principle and core practice (the most complete in history) "

" Get it done in one article: the chaotic relationship between SpringBoot, SLF4j, Log4j, Logback, and Netty (the most complete in history) "

Realize your linux freedom:

" Linux Commands Encyclopedia: 2W More Words, One Time to Realize Linux Freedom "

Realize your online freedom:

" Detailed explanation of TCP protocol (the most complete in history) "

" Three Network Tables: ARP Table, MAC Table, Routing Table, Realize Your Network Freedom!" ! "

Realize your distributed lock freedom:

" Redis Distributed Lock (Illustration - Second Understanding - The Most Complete in History) "

" Zookeeper Distributed Lock - Diagram - Second Understanding "

Realize your king component freedom:

" King of the Queue: Disruptor Principles, Architecture, and Source Code Penetration "

" The King of Cache: Caffeine Source Code, Architecture, and Principles (the most complete in history, 10W super long text) "

" The King of Cache: The Use of Caffeine (The Most Complete in History) "

" Java Agent probe, bytecode enhanced ByteBuddy (the most complete in history) "

Realize your interview questions freely:

4000 pages of "Nin's Java Interview Collection" 40 topics

Please go to the official account of "Technical Freedom Circle" to get the PDF file updates of the above Nien architecture notes and interview questions↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/130156355