epoll thundering herd problem - Solutions

【Encounter problems】

    The original hand there is a single process of linux epoll server program, recently want to rewrite it into a multi-process version, for two main reasons:

  1. During peak concurrent network service request is very massive, single-process the current version of the program a bit much: only one cycle of treatment has epoll_wait () to process single event, making the socket fd after some unfortunate events queued by the network processing not timely (worried that some socket client such as impatience timeout);
  2. We want to take advantage of the multi-CPU server;
 
    But with the depth rewrite work, they first encounter the "thundering herd" problem, at the beginning of my program is envisaged as follows:
  1. The main process first listen port, listen_fd = socket (...);
  2. Create epoll, epoll_fd = epoll_create (...);
  3. Then began fork (), each child process to enter a cycle to wait for new accept, epoll_wait (...), handling events.
 
    And then he encountered a "thundering herd" phenomenon: when listen_fd new accept () requests over, the operating system will awaken all child processes (because these processes are epoll_wait () with a listen_fd, the operating system has no way to determine who is responsible accept , simply altogether all wake up ......), but in the end there will only accept a successful process, other processes accept failure. IT foreign friends think all children are being "awakened", so called the Thundering Herd (shock group).
    Figuratively, the street there is a McDonald's restaurant, which has four service small windows, each have a waiter. When a new guest came in the front door, "Welcome!" Restaurant door automatic induction doorbell rang (the equivalent of the underlying operating system to catch them with a network event), then four attendants are looked up (the equivalent of an operating system wake All service process) would like to greet guests where their past service window. But the result can be imagined, where guests will eventually move towards a window, while the other three windows staff only "disappointed sigh," (which soon resigned sigh equivalent to accept () returns EAGAIN error), then continue buried do your own thing going.
    Like this "thundering herd" phenomenon will inevitably lead to resource waste, wood that has a good solution?
 
[Find ways to]
    N multi-read online posts and pages, reading a variety of excellent open source program source code , combined with their experimental tests, are summarized as follows:
  1.  In reality, when the shock group occurs, not all child processes will be awakened, but part of the child wakes up. But was awakened process is still only a successful accept, others are failures.
  2. All based on linux epoll mechanisms server program when multiple processes are frightened troubled group of issues, including lighttpd and nginx  and other procedures, approach various programs are not the same.
  3. Solutions for lighttpd: ignore the shock group . Using Watcher / Workers mode, there are specific measures to optimize the fork () and epoll_create () position (so that each child process themselves epoll_create () and epoll_wait ()), capture accept () throws to ignore the error and so on. Like this way, when there is more than lighttpd will accept new child process is awakened.
  4. nginx's Solution: Avoid shock group . Specific measures include using a global mutex, each child process in epoll_wait () to go before application lock, application processing to continue to get less than the wait, and set up a load balancing algorithm (when a sub-task process 7/8 of the total amount reaches a set amount, it will not be tried for a lock) to equalize the amount of each task process.
  5. An outstanding domestic commercial MTA server program (name not be disclosed): The Leader / Followers threading model, each thread of equal status, take turns Leader to respond to the request.
  6. Nginx and lighttpd compare two options, the former is easy to implement, simple logic, but that's part of the wake-up process unnecessary waste of resources caused by the cost of how debatable (there are users that this test is not part of the overhead http: //www.iteye . COM / Topic / 382 107). The latter is more complex logic, the introduction of a mutex and load balancing count points also brought more program spending. So these two programs at the same time solve the problem, there is a part of other computational overhead, which is just a bigger overhead, no data comparison.
  7. Fang Jian also circulated after the Linux 2.6.x kernel , you have solved the problem accept thundering herd, the paper addresses  http://static.usenix.org/event/usenix2000/freenix/full_papers/molloy/molloy.pdf  .
  8. But after the fact, mentioned in this paper improve and failed to solve the thundering herd problem actual production environment, because most programs are multi-process server in fork (), and then the epoll_wait (listen_fd, ... ) event like this when there is a new listen_fd accept the request when the process they will still be awakened. Improved paper mainly in the kernel level to make accept () an atomic operation, avoid multiple processes are called.
 
[Adopted] program
    Multi considerations, the final choice of reference lighttpd Watcher / Workers models, realized that section of the multi-process epoll program I need, the core process is as follows:
  1. The main process first listen port, listen_fd = socket (...);, setsockopt (listen_fd, SOL_SOCKET, SO_REUSEADDR, ...), setnonblocking (listen_fd), listen (listen_fd, ...).
  2. After the start of fork (), to reach the maximum number of child processes (suggestion configured according to the actual number of CPU core server), the main process to become a Watcher, only the child maintenance and overall signal processing work.
  3. Each new process (Worker), the all epoll create their own, epoll_fd = epoll_create (...) ;, then listen_fd added epoll_fd, and then into the general circulation, epoll_wait () listens to and processes the event. They should be careful, epoll_create () This step must be in the fork () after .
  4. Bold ideas (unrealized): Each Worker process using multi-threaded approach to improve the processing speed of large socket fd cycle, consider adding a mutex synchronization necessary to do, but also worried that this child more harm than good (+ thread processes brought frequent switching additional operating system overhead), this step has not been implemented and tested, but seeing nginx source code looks like this logic.
 
【summary】
   Throughout the development of Linux server program now (whether the game server / WebServer server / balabala all types of application servers), epoll can be described as popular, fried chicken one. It is indeed a good thing, event processing ability of a single process has been much stronger than the poll / select, no wonder Nginx / Lighttpd force and other programs are so fond of it.
    But after all, only one process, then, Hanging multiple CPU servers it is a sin for the pursuit of higher machine utilization and shorter processing times in response to a request , or toss come up with a multi-process epoll. The performance of the new program on the online server, the effect is really good, happy.

Guess you like

Origin www.cnblogs.com/redman274/p/12200735.html