A linux system network tuning

The udp protocol is used to transmit data between the two servers. After increasing the transmission volume, a lot of packet loss is found in the log. Using nload, it is found that the average rate is only 200Mbp/s, which does not reach the theoretical limit. Is the network card faulty?

After careful thinking, I found the reason. nload counts the average rate, but the network transmission is very fast, not a stable rate. Maybe a certain instantaneous rate exceeds the limit of the precursor network card, and packet loss occurs. So the average rate is not a good indicator of the problem, although the average rate is low, the network still forms a bottleneck.

Later, the computer room was moved and a 10G network was used, and the packet loss phenomenon was greatly improved, so the transmission volume continued to increase, and the packet loss phenomenon appeared again. Reaching the limit of 10 Gigabit network so soon?

After thinking again, I found the reason, the socket buffer is too small. To give a visual example, a large water pipe practiced a small pool, and the pool soon overflowed, so packet loss occurred again. According to the instructions, the buffer size was increased, and the packet loss phenomenon disappeared again.

Appendix (Network Tuning):

Tip 1. Minimize the Delay of Packet Transmission

When communicating over a TCP socket, data is split into chunks so that they can be encapsulated into the TCP payload for a given connection (refer to TCP payload in the packet). The size of the TCP payload depends on several factors (such as maximum packet length and path), but these factors are known at connection initiation. For best performance, our goal is to fill each packet with as much data as possible. When there is not enough data to fill the payload (also known as the maximum segment size or MSS), TCP uses the Nagle algorithm to automatically concatenate some small buffers into a segment. This can improve the efficiency of the application by minimizing the number of packets sent and alleviate the overall network congestion problem.
Although John Nagle's algorithm can minimize the number of packets sent by concatenating the data into larger packets, sometimes you may want to send only a few smaller packets. A simple example is the telnet program, which allows users to interact with remote systems, usually through a shell. If the user is required to fill a segment with the characters entered before sending the message, then this method will definitely not meet our needs.
Another example is the HTTP protocol. Typically, a client browser makes a small request (an HTTP request message), and the Web server returns a larger response (a Web page).
Solution The first thing
you should consider is that Nagle's algorithm fulfills a need. Since this algorithm combines data in an attempt to form a complete TCP segment, it introduces some delay. But this algorithm can minimize the number of packets sent on the line, and thus minimize the problem of network congestion.
But in situations where transmission delays need to be minimized, the Sockets API can provide a solution. To disable Nagle's algorithm, you can set the TCP_NODELAY socket option, as shown in Listing 1.
Listing 1. Disable the Nagle algorithm for TCP socket
int sock, flag, ret;
/* Create new stream socket */
sock = socket( AF_INET, SOCK_STREAM, 0 );
/* Disable the Nagle (TCP No Delay) algorithm */
flag = 1;
ret = setsockopt( sock, IPPROTO_TCP, TCP_NODELAY, (char *)&flag, sizeof(flag) );
if (ret == -1) {
  printf("Couldn't setsockopt(TCP_NODELAY)\n");
  exit(-1 );
}



Tip 2. Minimize the load of system

calls Solution
When writing data to a socket, try to write all the data at one time instead of performing multiple write operations. For read operations, it is best to pass in the largest buffer that can be supported, because the kernel will also try to fill the entire buffer if there is not enough data (and also keep the TCP advertisement window open). This way you can minimize the number of calls and can achieve better overall performance.

Tip 3. Tuning the TCP window for Bandwidth Delay Product

TCP performance depends on several factors. The two most important factors are link bandwidth (the rate at which messages are transmitted over the network) and round-trip time or RTT (the delay between sending a message and receiving a response from the other end) ). These two values ​​determine what is called the Bandwidth Delay Product (BDP).
Given the link bandwidth and RTT, you can calculate the value of BDP, but what does this mean? BDP gives an easy way to calculate the theoretically optimal TCP socket buffer size (which holds data queued for transmission and waiting for applications to receive it). If the buffer is too small, the TCP window cannot be fully opened, which can limit performance. If the buffer is too large, precious memory resources are wasted. If you set the buffer size just right, then you can fully utilize the available bandwidth. Let's look at an example:
BDP = link_bandwidth * RTT
If the application communicates through a 100Mbps LAN and its RRT is 50 ms, then the BDP is:
100MBps * 0.050 sec / 8 = 0.625MB = 625KB
Note: Except here To 8 is to convert bits into bytes used for communication.
Therefore, we can set the TCP window to BDP or 1.25MB. But on Linux 2.6 the default TCP window size is 110KB, which limits the bandwidth of the connection to 2.2MBps, calculated as:
throughput = window_size / RTT

110KB / 0.050 = 2.2MBps
If we use the window size calculated above, we get The bandwidth is 12.5MBps, calculated as follows:
625KB / 0.050 = 12.5MBps
is indeed a big difference, and can provide greater throughput for sockets. So now you know how to calculate the optimal buffer size for your socket. But how to change it?
solution
The Sockets API provides several socket options, two of which can be used to modify the size of the socket's send and receive buffers. Listing 2 shows how to use the SO_SNDBUF and SO_RCVBUF options to adjust the size of the send and receive buffers.
Note: Although the size of the socket buffer determines the size of the advertising TCP window, TCP also maintains a congestion window within the advertising window. Therefore, a given socket may never take advantage of the maximum advertised window due to this congestion window.
Listing 2. Manually setting send and receive socket buffer sizes
int ret, sock, sock_buf_size;
sock = socket( AF_INET, SOCK_STREAM, 0 );
sock_buf_size = BDP;
ret = setsockopt( sock, SOL_SOCKET, SO_SNDBUF,
                   (char *)&sock_buf_size, sizeof(sock_buf_size) );
ret = setsockopt( sock, SOL_SOCKET, SO_RCVBUF,
                   (char *)&sock_buf_size, sizeof(sock_buf_size) );


Tip 4. Dynamically Optimizing the GNU/Linux TCP/IP Stack

Standard GNU/Linux distributions attempt to All deployment scenarios are optimized. This means that standard distributions may not have special optimizations for your environment.
solution
GNU/Linux provides a number of adjustable kernel parameters that you can use to dynamically configure the operating system for your own purposes. Let's take a look at some of the more important options that affect socket performance.
There are some tunable kernel parameters in the /proc virtual filesystem. Each file in this filesystem represents one or more parameters, which can be read with the cat tool or modified with the echo command. Listing 3 shows how to query or enable a tunable parameter (in this case, IP forwarding can be enabled in the TCP/IP stack).
Listing 3. Tuning: Enable IP forwarding in the TCP/IP stack
[root@camus]# cat /proc/sys/net/ipv4/ip_forward
0
[root@camus]# echo "1" > /poc/sys/net /ipv4/ip_forward
[root@camus]# cat /proc/sys/net/ipv4/ip_forward
1
[root@camus]#
Table 1 gives several adjustable parameters that can help you improve the Linux TCP/IP stack performance.
Table 1. Tunable Kernel Parameters Used by TCP/IP Stack Performance
Tunable Parameters Default Value Option Description
/proc/sys/net/core/rmem_default "110592" Defines the default receive window size; for larger BDPs, This size should also be larger.
/proc/sys/net/core/rmem_max "110592" defines the maximum size of the receive window; for larger BDPs, this size should also be larger.
/proc/sys/net/core/wmem_default "110592" defines the default send window size; for larger BDPs, this size should also be larger.
/proc/sys/net/core/wmem_max "110592" defines the maximum size of the send window; for larger BDPs, this size should also be larger.
/proc/sys/net/ipv4/tcp_wmem "4096 16384 131072" Defines the memory used per socket for autotuning. The first value is the minimum number of bytes allocated for the socket's send buffer. The second value is the default value (which is overridden by wmem_default), and the buffer can grow to this value when the system is not under heavy load. The third value is the maximum number of bytes of send buffer space (this value will be overridden by wmem_max).
/proc/sys/net/ipv4/tcp_rmem "4096 87380 174760" Similar to tcp_wmem, except that it represents the value of the receive buffer used for automatic tuning.

Reference:

http://www.ibm.com/developerworks/cn/linux/l-hisock.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326485757&siteId=291194637