Remember an interesting Nginx online troubleshooting

phenomenon

  • Configure one nginx for the two servers for load balancing and high availability, but when requesting data through nginx, I found that every time I call the first one, an error is reported, and the second is no problem (which one is found through the respective reverse proxy There is a problem with the station).
  • However, if you call the interface of each server individually, you can get the data normally.

Investigation process

  • [ Step 1: Check if it’s a network problem ]
    • By ping and telnet port on nginx to see if the network and port can not be connected, and see if both services and the firewall of nginx are enabled.
    • The results are all connected.
  • [ Step 2: Check whether it is the machine reason ]
    • Change the load balancing to a single reverse proxy. The result is that the first one is still not available, and the second one is fine.
    • The high probability is related to some configurations of the first station.
  • [ Step 3: Capture the packet to confirm whether the data has arrived, and the difference ]
    • By installing Wireshark on the server to capture packets, it is found that the requests from nginx are the same except for the postman token, but the responses of the two machines are different. The first always outputs error information, and the second always outputs success. information. Take a look at the error message of the first station, which is probably related to permission authentication.
    • The guess may be related to the authorization authentication of the first station.
  • [ Step 4: Check the error.log of nginx ]
    • Did not see any useful information.
  • [ Step 5: Ask the company boss for help ]
    • It is speculated that it is the reason for the session retention, so the nginx configuration file is modified, and the upstream allocation algorithm is modified to ip_hash.
    • The previous packet capture also found the message forwarded by nginx, connection: close, and the directly called message has connection=keep-alive, so there is reason to suspect that the authentication failed because the link is not maintained.
    • Solve the problem successfully, and get data every time during the test.

postscript

  • Five minutes later, on the way home, I thought I was wrong. This problem has not been solved, because the ip_hash algorithm ensures that the same ip is mapped to the same backend for session retention. When I tested it, it was the local machine. The same ip, there is a fluke situation, I mapped to the second one, so every request is successful.

  • When I continued to verify today, I switched to another machine to send the request, but it really failed. Afterwards, Zhuge Liang found three hard evidence:

    • I tried to proxy the first one alone before, but it was also unsuccessful, so it is unlikely that the session is maintained.
    • When forwarding through nginx, the messages of the two machines are connection: close, so if it is because of the session retention, it is impossible for one to communicate and the other to fail.
    • After modifying the forwarding algorithm to ip_hash, it is found through packet capture that the message is still connection: close, and nginx cannot add Keep-Alive to the response header . Nginx uses the http1.0 protocol, so there is no such possibility. For the session retention mechanism that nginx is not responsible for, see Nginx session retention
  • After checking the error message repeatedly, the company boss suggested that I try to convert the http of nginx to tcp, and the result was. After repeated tests, I finally confirmed that it was really successful this time.

http{
    
    
    upstream tomcat_pool{
    
    
        #server tomcat地址:端口号 weight表示权值,权值越大,被分配的几率越大;
        server 192.168.80.22:8080 weight=4 max_fails=2 fail_timeout=30s;
        server 192.168.80.22:8081 weight=4 max_fails=2 fail_timeout=30s;
    }
    server {
    
    
        listen       80;
        server_name  tomcat_pool;
        location / {
    
    
            #root   html;
            #index  index.html index.htm;
            proxy_pass http://tomcat_pool;    #转向tomcat处理
            proxy_set_header   Host             $host;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
        }
}

Replace with

stream {
    
    
    upstream rtmp {
    
    
        server 127.0.0.1:8089; # 这里配置成要访问的地址
        server 127.0.0.2:1935;
        server 127.0.0.3:1935; #需要代理的端口,在这里我代理一一个RTMP模块的接口1935
    }
    server {
    
    
        listen 1935;  # 需要监听的端口
        proxy_timeout 20s;
        proxy_pass rtmp;
    }
}
  • Note: You need to recompile nginx, you need to add: –with-stream, then make & make install.
  • Reluctantly give an explanation: http should require session retention and an authentication token, and it is forwarded to the back-end machine through nginx's tcp, and the back-end machine establishes authentication information. nginx only forwards tcp requests, not at the application layer information.
  • The troubleshooting idea is to reduce the role of nginx and only forward it, then downgrade from the http layer to the tcp layer.

Phenomenon 2

  • The other services of the two servers that use nginx for load balancing can be accessed, but a certain service cannot be accessed, but it can be accessed separately without nginx.
  • Solution:
    • After trying the above method, it was found that the port forwarded by nginx was occupied. The occupied service started earlier, so when you restart the service later, you forget that this port is called 8888. Don't use this type of special port.

Guess you like

Origin blog.csdn.net/ljfirst/article/details/108248724