(Transfer) Apache and Nginx operating principle analysis

Analysis of the operating principle of Apache and Nginx

Original: https://www.server110.com/nginx/201402/6543.html

web server

The Web server is also called WWW (WORLD WIDE WEB) server, and its main function is to provide online information browsing services.

The application layer uses the HTTP protocol.

HTML document format.

Browser Uniform Resource Locator (URL).

Web servers often provide services in the form of B/S (Browser/Server). The browser and server interact as follows:


 GET /index.php HTTP/1.1

 +---------------+                   +----------------+
 |               +------------------->                |
 |   Browser     |                   |   Server       |
 |               <-------------------+                |
 +---------------+                   +----------------+

                   HTTP/1.1 200 OK


The browser makes an HTTP request (Request) to the server.

The server receives the request data from the browser, analyzes and processes it, and outputs the response data (Response) to the browser.

The browser receives the response data from the server, analyzes and processes it, and displays the final result in the browser.

Both Apache and Nginx are web servers, and both implement the HTTP 1.1 protocol.

Apache overview

Apache HTTP Server is an open source web server from the Apache Software Foundation that can run on most computer operating systems due to its cross-platform and security. It is widely used and is one of the most popular web server-side software. It is fast, reliable and extensible with a simple API to compile interpreters such as Perl/Python into the server. -- Wikipedia

Apache components

Apache is based on modular design. Its core code is not much, and most of the functions are scattered into various modules, and each module is loaded on demand when the system starts.


"text">         +----------+
      +- | Module   | -----------------+
      |  +----------+                  |
      |                          +------------+
+-----------+   Apache HTTPD     | php module |
| Module    |                    +------------+
+-----------+              +----------+|
      +----------+-------- |  MPM     |+
                 |         +----+---+-+
               +-v-----------+  |   |
               |    ARP      <--+   |
               +------+------+      |
                      |             |
      +---------------v-------------v--+
      |      Operating  System         |
      +--------------------------------+


MPM (Multi-Processing Modules, multi-processing module) is one of the core components of Apache. Apache uses the resources of the operating system to manage processes and thread pools through MPM. In order to obtain the best performance, Apache is optimized for different platforms (Unix/Linux, Window), and provides different MPMs for different platforms. Users can choose according to the actual situation. The most commonly used MPMs are There are two kinds of prefork and worker. The way your server is running depends on the MPM build parameters specified during the Apache installation. The default build parameter on X systems is prefork.

Since most Unixes do not support true threads, the prefork method is adopted. For platforms that support threads such as Windows or Solaris, the worker mode based on multi-process and multi-thread hybrid is a good choice. Another important component in Apache is APR (Apache Portable Runtime Library), which is the Apache Portable Runtime Library. It is an abstract library that calls the operating system, which is used to realize the use of the operating system by the internal components of Apache and improve the performance of the system. portability. Apache's parsing of php is done through the php Module in many Modules.

Apache Lifecycle
"text"> +------------------------------------------------------ --------------------+
   | +---------------------+ Startup Phase |
   | | system startup, configuration | |
   | +----------+----------+ |
   | | |
   | +----------v--- -------+ |
   | | Module initialization | |
   | +-+--------+--------+-+ |
   | | | | | |
   | +-- -------------+ | +------v-------+| +--------------+ |
   | | Child process initialization |<+ | Child process initialization |+>| Child process initialization | |
   | +------+------+ +-------+---- --+ +-------+------+ |
   +---------------------------- ---------------------------------+
   | | | | Operational Phase |
   | +----v-- --+ +----v----+ +----v----+ |
   | | request loop | | request loop | | request loop | |
   | +----+--- -+ +----+----+ +----+----+ |
   | | | | |
   | +------v------+ +--- ---v------+ +------v------+ |
   | | child process ends | | child process ends | | child process ends | |
   | +---- ---------+ +-------------+ +-------------+ |
   +--------------------------------------------------------------+


This life cycle is a representation of perfork work. As can be seen from the figure, Apache starts a separate process for each request to process.

The working principle of prefork, the working mode of Apache 

A separate controlling process (the parent process) is responsible for spawning child processes that listen for requests and respond. Apache always tries to keep some spare or idle child processes for incoming requests. This way the client does not need to wait for the child process to spawn before being served. On Unix systems, the parent process usually runs as root to bind port 80, while the child process spawned by Apache usually runs as a low-privileged user. The User and Group directives are used to configure the low-privileged user of the child process. The user running the subprocess must have read permissions to the content he is serving, but must have as few permissions as possible to resources other than the content of the service.

How workers work

The number of threads each process can have is fixed. The server will increase or decrease the number of processes depending on the load. A separate controlling process (the parent process) is responsible for the establishment of the child process. Each child process can establish ThreadsPerChild number of service threads and a listener thread, which listens for access requests and passes them to the service thread for processing and response. Apache always tries to maintain a spare or idle pool of service threads. In this way, clients can be processed without waiting for new threads or new processes to be established. In Unix, in order to be able to bind port 80, the parent process is generally started as root, and then Apache creates child processes and threads as a lower-privileged user. User and Group directives are used to configure the permissions of Apache child processes. Although the child process must have read access to the content it provides, it should be given as few privileges as possible. Also, unless suexec is used, the permissions configured by these directives will be inherited by CGI scripts.

Apache running Startup phase

In the startup phase, Apache mainly performs configuration file parsing (such as http.conf and configuration files set by Include directives, etc.), module loading (such as modphp.so, modperl.so, etc.) and system resource initialization (such as log files, shared memory, etc.) segment, etc.) work. At this stage, in order to obtain the maximum use rights of system resources, Apache will start as a privileged user root (X system) or super administrator administrator (Windows system).

This process can be better understood through the following diagram:


"text"> + -------+
       | Start |
       +----+---+
            |
 +----------v---------- --+ Parse the configuration information in the main configuration file http.conf,
 | Parse the configuration file |
 Instructions such as LoadModule, AddType +----------+------------+ Loaded into memory
            |
 +------------v------------+ According to AddModule, LoadModule, etc.
 | Loading static/dynamic modules | Loading Apache modules, like mod_php5. so is
 loaded into memory by +----------+------------+ and mapped into the Apache address space.
            |
 +------------v------------+ log file, shared memory segment, database link
 | system resource initialization | etc initialization
 +------- ---+------------+
            |
        +---v----+
        | End |
        +------------+


Running Phase

During the running phase, Apache's main job is to process user service requests. At this stage, Apache gives up the privileged user level and uses ordinary permissions, which is mainly based on security considerations to prevent security holes caused by code flaws.

Due to Apache's Hook mechanism, Apache allows modules (including internal and external modules, such as mod_php5.so, mod_perl.so, etc.) to inject custom functions into the request processing loop. mod_php5.so/php5apache2.dll is to inject the included custom functions into Apache through the Hook mechanism, and is responsible for processing php requests at various stages of the Apache processing process.

Apache divides the request processing cycle into 11 stages, which are: Post-Read-Request, URI Translation, Header Parsing, Access Control, Authentication, Authorization, MIME Type Checking, FixUp, Response, Logging, CleanUp.

The life cycle of Apache processing http requests:

Apache's life cycle of handling http requests

Post-Read-Request phase: In the normal request processing flow, this is the first phase where a module can insert hooks. This stage can be exploited for modules that want to get into processing requests very early.

URI Translation stage: Apache's main work at this stage: mapping the requested URL to the local file system. Modules can insert hooks at this stage to perform their own mapping logic. mod_alias uses this stage to work.

Header Parsing stage: Apache's main work in this stage: check the header of the request. Since modules can perform the task of checking request headers at any point in the request processing flow, this hook is rarely used. mod_setenvif uses this stage to work.

Access Control stage: The main work of Apache in this stage: check whether the requested resources are allowed to be accessed according to the configuration file. Apache's standard logic implements allow and deny directives. modauthzhost uses this stage to work.

Authentication stage: Apache's main work in this stage is to authenticate users according to the policy set in the configuration file, and set the user name area. Modules can insert hooks at this stage to implement an authentication method.

Authorization stage: The main work of Apache in this stage: check whether the authenticated user is allowed to perform the requested operation according to the configuration file. Modules can insert hooks at this stage to implement a method for user rights management.

MIME Type Checking stage: The main work of Apache in this stage is to determine the content processing function to be used according to the relevant rules of the MIME type of the requested resource. The standard modules modnegotiation and modmime implement this hook.

FixUp Phase: This is a generic phase that allows the module to run any necessary processing before the content generator. Similar to PostReadRequest, this is a hook capable of capturing any information and is the most commonly used hook.

Response stage: Apache's main work in this stage: generating the content returned to the client, and is responsible for sending an appropriate reply to the client. This stage is the core part of the overall processing flow.

Logging phase: Apache's main job in this phase: logging the transaction after the reply has been sent to the client. Modules may modify or replace Apache's standard logging.

CleanUp stage: Apache's main work in this stage: clean up the environment left after the completion of the request transaction, such as file, directory processing or Socket closing, etc. This is the last stage of Apache's request processing.

Nginx overview

Nginx (pronounced with engine x) is a lightweight web server, reverse proxy server and email (IMAP/POP3) proxy server developed by Russian programmer Igor Sysoev. It was originally used by the large Russian portal and search engine Rambler (Russian: Рамблер). -- Wikipedia

Nginx modules and working principle

Nginx consists of a kernel and modules. The design of the kernel is very small and concise, and the work done is also very simple. It only maps the client request to a location block by looking up the configuration file (location is an instruction in the Nginx configuration, used for URL matching), and each directive configured in this location will launch a different module to complete the corresponding work.

The modules of Nginx are structurally divided into core modules, basic modules and third-party modules:

Core modules: HTTP module, EVENT module and MAIL module

Basic modules: HTTP Access module, HTTP FastCGI module, HTTP Proxy module and HTTP Rewrite module,

Third-party modules: HTTP Upstream Request Hash module, Notice module and HTTP Access Key module.

Nginx modules are functionally divided into the following three categories:

Handlers (handler modules). Such modules process requests directly, and perform operations such as outputting content and modifying header information. There can generally only be one Handlers processor module.

Filters (filter modules). This type of module mainly modifies the content output by other processor modules, and finally outputs it by Nginx.

Proxies (proxy class modules). Such modules are modules such as Nginx's HTTP Upstream. These modules mainly interact with some back-end services such as FastCGI to implement functions such as service proxy and load balancing.


"text">                     +                    ^
        Http Request |                    |  Http Response
                     |                    |
    +---------+------v-----+         +----+----+
    |  Conf   | Nginx Core |         | FilterN |
    +---------+------+-----+         +----^----+
                     |                    |
                     |               +----+----+
                     |               | Filter2 |
choose a handler     |               +----^----+
based conf           |                    |
                     |               +----+----+
                     |               | Filter1 |
                     |               +----^----+
                     |                    | Generate content
               +-----v--------------------+----+
               |           Handler             |
               +-------------------------------+


Nginx itself does very little work. When it receives an HTTP request, it just maps the request to a location block by looking up the configuration file, and each directive configured in this location will start different modules To complete the work, so the module can be regarded as the real labor of Nginx. Usually the instructions in a location involve a handler module and multiple filter modules (of course, multiple locations can reuse the same module). The handler module is responsible for processing the request and completing the generation of the response content, while the filter module processes the response content.

Nginx Architecture and Workflow

Nginx Architecture

The above figure is the architecture of Nginx, which is similar to the working state of Apache's Worker. Each Worker process of Nginx manages a large number of threads, and it is the threads under the Worker that actually process requests.

All the actual business processing logic is in the worker process. There is a function in the worker process that executes an infinite loop, keeps processing incoming requests from clients, and processes them until the entire nginx service is stopped. The execution of this function in the Worker is as follows:

Mechanisms provided by the operating system (such as epoll, kqueue, etc.) generate related events.

Receive and process these events, and if data is received, a higher-level request object is generated.

Process the header and body of the request.

A response is generated and sent back to the client.

Complete the processing of the request.

Reinitialize timers and other events.

Nginx和FastCGI FastCGI

FastCGI is a scalable, high-speed communication interface between HTTP servers and dynamic scripting languages. Most popular HTTP servers support FastCGI, including Apache, Nginx, and lighttpd. At the same time, FastCGI is also supported by many scripting languages, including PHP.

FastCGI is developed and improved from CGI. The main disadvantage of the traditional CGI interface method is poor performance, because every time the HTTP server encounters a dynamic program, the script parser needs to be restarted to perform the parsing, and then return the result to the HTTP server. This is almost unusable when dealing with high concurrent access. In addition, the security of the traditional CGI interface method is also very poor, and it is rarely used now.

The FastCGI interface adopts the C/S structure, which can separate the HTTP server and the script parsing server, and start one or more script parsing daemons on the script parsing server at the same time. Every time the HTTP server encounters a dynamic program, it can be directly delivered to the FastCGI process for execution, and then the result is returned to the browser. In this way, the HTTP server can exclusively handle static requests or return the results of the dynamic script server to the client, which greatly improves the performance of the entire application system.

Nging and FastCGI collaborate

Nginx does not support direct invocation or parsing of external programs, all external programs (including PHP) must be invoked through the FastCGI interface. The FastCGI interface is a socket under Linux (this socket can be a file socket or an ip socket).

Next, the running process of PHP under Nginx is explained. PHP-FPM is a manager for managing FastCGI, which exists as a plugin for PHP.

The FastCGI process manager php-fpm initializes itself, starts the main process php-fpm and starts the start_servers CGI child processes. The main process php-fpm mainly manages the fastcgi sub-process and listens on port 9000. The fastcgi child process waits for a connection from the Web Server.

When the client request arrives at the Web Server Nginx, Nginx passes the location instruction to hand over all files with a suffix of php to 127.0.0.1:9000 for processing, that is, Nginx passes the location instruction to all files with a suffix of php. Hand over to 127.0.0.1:9000 for processing.

The FastCGI process manager PHP-FPM selects and connects to a subprocess CGI interpreter. The web server sends CGI environment variables and standard input to the FastCGI child process.

The FastCGI subprocess returns standard output and error messages from the same connection to the Web Server after processing. The request is processed when the FastCGI child process closes the connection.

The FastCGI child process then waits for and handles the next connection from the FastCGI process manager (running in the WebServer).

Apache and Nginx comparison Feature comparison

Like Apache, Nginx is an HTTP server software. It adopts a modular structure design in function implementation, and supports common language interfaces, such as PHP, Perl, Python, etc., and also supports forward and reverse proxies, virtual hosts, URL rewriting, compressed transmission, SSL encrypted transmission, etc.

In terms of functional implementation, all Apache modules support dynamic and static compilation, while Nginx modules are statically compiled.

For FastCGI support, Apache's support for Fcgi is not good, while Nginx's support for Fcgi is very good;

In terms of processing connections, Nginx supports epoll, but Apache does not;

In terms of space usage, the Nginx installation package is only a few hundred K. Compared with Nginx, Apache is definitely a behemoth.

Advantages of Nginx over apache

Lightweight, same as web service, takes up less memory and resources than apache

Static processing, Nginx static processing performance is more than 3 times higher than Apache

Anti-concurrency, nginx processes requests asynchronously and non-blocking, while apache is blocking. Under high concurrency, nginx can maintain low resource consumption and high performance. In the Apache+PHP (prefork) mode, if the PHP processing is slow or the front-end pressure is high, the number of Apache processes is likely to soar, resulting in denial of service.

Highly modular design, writing modules is relatively simple

The community is active, and various high-performance modules are produced quickly

Advantages of apache over nginx

rewrite, more powerful than nginx's rewrite

There are so many modules, you can find everything you can think of

Fewer bugs, nginx has relatively more bugs

Ultra stable

Apache's support for PHP is relatively simple, and Nginx needs to be used with other backends

The advantages of choosing Nginx

As a web server: Nginx processes static files and index files, and the efficiency of automatic indexing is very high.

As a proxy server, Nginx can achieve cache-free reverse proxy acceleration and improve website running speed.

As a load balancing server, Nginx can directly support Rails and PHP internally, and can also support HTTP proxy servers for external services. It also supports simple fault tolerance and load balancing using algorithms.

In terms of performance, Nginx is specially developed for performance optimization and pays great attention to efficiency in implementation. It adopts the kernel Poll model (epoll and kqueue), can support more concurrent connections, can support a maximum response to 50,000 concurrent connections, and only occupies very low memory resources.

In terms of stability, Nginx adopts a phased resource allocation technology, which makes the occupancy rate of CPU and memory very low. According to Nginx officials, Nginx maintains 10,000 inactive connections, and these connections only occupy 2.5MB of memory, so attacks like DOS are basically useless for Nginx.

In terms of high availability, Nginx supports hot deployment, and the startup speed is particularly fast, so the software version or configuration can be upgraded under the condition of uninterrupted service. Even if it runs for several months, it does not need to be restarted, and it can almost do 7×24 hours run uninterrupted.

Using both Nginx and Apache

Due to the respective advantages of Nginx and Apache, many people now choose to let the two coexist on the server. On the server side, let Nginx be in the front and Apache in the back. Load balancing and reverse proxy are done by Nginx, and static files are processed, and dynamic requests (such as PHP applications) are handed over to Apache for processing.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324979066&siteId=291194637