Remember a hospital online problem slowly and slowly, troubleshooting and solving

       The beginning of the story is the implementation of the little brother to migrate the hospital’s database privately, and then it collapsed, and then went to fix the database again, and went to the hospital to apologize, after all, there is still a large amount of balance that has not been paid, and then the hospital keeps on going. Call the system slow and slow, and the following is the solution to the slow slow problem this time.

      Open the system log and found that the database log is not printed. Because the hospital has done load balancing, we added the configuration parameters of the database log one by one.

  logging.level.***.mysqlmapper=debug

     Then it was discovered that a large number of requests were sent to the same interface at the same time, which caused the pressure on the database to increase. Data affecting other businesses. Therefore, this function is temporarily disabled in an emergency. Then I continued to check the log and found that two nodes did not receive the request, so I checked the configuration of nginx.conf and found that there was no problem. The port of the ping receiving end can also receive the data normally. Because telnet is the hospital's intranet, the installation procedure cannot be carried out, so the reason is not found for a while.

Later, I asked the implementation to go to the hospital to connect to the hospital's intranet telnet + ip + port number and found that it was not working, and then asked, the hospital upgraded the computer the night before, and the result was that the server restarted and the firewall was turned on. Then turn off the firewall of the server, continue to observe the log, and found that it is normal.

    Summarizing today's experience, the main reason is that I don't understand the content of operation and maintenance very well. The specific ideas are as follows:

   1. When nginx balances the back-end load, if there is a special health check module, it depends on whether there is a problem with the health check. If this kind of ordinary load balancing does not do any cookie or ip_hash, it will most likely be polled.

  2. If the allocation request is uneven, some machines have no request, and there is no health check, the port service reports an error, such as 502, 404, etc., nginx will continue to forward it regardless of it. Unless you can't find this port at all, nginx will not forward

According to the above logical idea, first, I found that I did not have a health check, and then I checked the port connection, because the internal network environment does not have a reliable tool to detect the communication between nginx and the back-end node port, and I can only see if the server is up and the firewall is up. No, the problem cannot be quickly located.

 

 

 

 

 

Guess you like

Origin blog.csdn.net/A___B___C/article/details/107693948