High-performance WEB services NGINX

I / O Introduction

I / O:
Network IO: essentially a socket file to read
the disk IO:
Every IO, to be via two phases:
The first step: the data file to be loaded from kernel memory space (buffer) until the data is ready , longer
Step two: copy the data from the kernel buffer to the user process's memory space, the time is short

I / O model

Synchronous / asynchronous: concern is the message communication mechanism for
synchronization: synchronous, caller waiting to be returned to the caller a message, you can continue to perform
asynchronous: asynchronous, the caller through the state, notification or callback mechanism to notify the operating state of the caller callee
blocking / non-blocking: the caller's attention in the state it was before waiting for the results returned
blocking: blocking, until after the completion of IO operations need to completely return to user space, refers to the call returns, the caller is suspended
non-blocking: nonblocking, refers to IO operation returns immediately after being called a status value to the user, without waiting for IO operations fully completed before the final call returns, the caller will not be suspended
I / O model:
obstructive, non-blocking, re-use, signals driven, asynchronous

Non-blocking IO model

Blocking IO model

Blocking IO IO model is the simplest model, the user thread is blocked IO operations in the kernel
user thread read IO read operations initiated by system calls from user space to kernel space. After the packet arrives until the kernel, and then copy the received data into user space, the read operation is completed
the user to wait to read the read data buffer, before continuing to process the received data. Throughout IO request, the user thread is blocked, which leads the user when initiating IO requests can not do anything, not CPU resource utilization
advantages: simple procedures, processes while waiting for data blocked / suspended thread substantially no CPU resources
drawbacks: it requires separate connections for each process / thread processing alone, when a large amount of concurrent requests for maintenance procedures, memory, thread switching large overhead, this model is rarely used in actual production

For single-threaded network services, this will be a problem stuck. Because when waiting, the entire thread is suspended, it can not be performed, nor do other work. This Block is another program (process) does not affect the running simultaneously, because modern operating systems are multitasking, switching between tasks are preemptive. Block Block here refers only to the current process
of network services at the same time respond to multiple simultaneous network requests must be implemented as a multi-threads, each dealing with a network request. Increase the number of threads as the number of concurrent connections linear. Prior to 2000 the network server is so much to achieve. But there are two problems: The more threads, the more Context Switch, and Context Switch is a relatively heavy operation, needlessly wasting a lot of CPU. Each thread will also take up some memory as the thread's stack
to process the request, although both can be by concurrent thread pool, it will not produce a large number of threads. However, this will limit the maximum number of concurrent connections.
When you call read to accept network requests to have data on the handling, no data to actually be doing something else. The reason for using a large number of threads, simply because Block happen

When a user thread returns immediately initiate IO requests. But did not read any data the user thread needs to continue to initiate IO requests until after the data arrives, really read the data, continue. Or "polling" mechanism
there are two problems: If you have a large number of file descriptors have to wait, then you have a one read. This will bring a lot of Context Switch (read system call, every time you have to call in user mode and kernel mode switching time). Polling time is not good grasp. Here it is to guess how long after the data to get to. Waiting time is set too long, program response delay is too large; set too short, it will cause too frequent retries, dry it CPU consumption
is more way to waste CPU, rarely directly using this model, but use non-blocking IO IO this feature in other models

IO signal drive model

Asynchronous IO model

The main difference between asynchronous IO and IO signal is a signal drive when the drive is IO IO operations can be performed by the kernel inform the application, and asynchronous IO is a kernel thread IO tell the user when an operation completes. When the kernel driver signal IO notification trigger signal handler, the signal handler further need in blocking buffer to copy data from kernel space to user space buffer at this stage, while the asynchronous IO is done directly in the second stage, the kernel directly notify the user can thread a subsequent operation
POSIX specification defined by the application to inform the kernel starts an operation, the entire operation and allowing the core (including the data from the kernel buffer to the application copy of) after completion notification application
advantages: asynchronous I / O DMA can take advantage of features to the I / O operations and to calculate the overlap
disadvantages: to achieve true asynchronous I / O, the operating system needs to do a lot of work. When the current through the Windows IOCP to achieve a true asynchronous I / O, under Linux systems, Linux 2.6 was introduced, the current AIO is not perfect, thus achieving high concurrent network programming under Linux to IO multiplexing model multithreaded mode + meet the basic needs of architecture

Embodied I / O model

这五种 I/O 模型中,越往后,阻塞越少,理论上效率也是最优前四种属于同步 I/O,因为其中真正的 I/O 操作(recvfrom)将阻塞进程/线程,只有异步 I/O 模型才与 POSIX 定义的异步 I/O 相匹配
主要实现方式有以下几种:
Select:Linux实现对应,I/O复用模型,BSD4.2最早实现,POSIX标准,一般操作系统均有实现
Poll:Linux实现,对应I/O复用模型,System V unix最早实现
Epoll:Linux特有,对应I/O复用模型,具有信号驱动I/O模型的某些特性
Kqueue:FreeBSD实现,对应I/O复用模型,具有信号驱动I/O模型某些特性
/dev/poll:SUN的Solaris实现,对应I/O复用模型,具有信号驱动I/O模型的某些特性
Iocp Windows实现,对应第5种(异步I/O)模型

select/poll/epoll

Select:POSIX所规定,目前几乎在所有的平台上支持,其良好跨平台支持也是它的一个优点,本质上是通过设置或者检查存放fd标志位的数据结构来进行下一步处理
缺点
单个进程能够监视的文件描述符的数量存在最大限制,在Linux上一般为1024,可以通过修改宏定义FD_SETSIZE,再重新编译内核实现,但是这样也会造成效率的降低
单个进程可监视的fd数量被限制,默认是1024,修改此值需要重新编译内核
对socket是线性扫描,即采用轮询的方法,效率较低
select 采取了内存拷贝方法来实现内核将 FD 消息通知给用户空间,这样一个用来存放大量fd的数据结构,这样会使得用户空间和内核空间在传递该结构时复制开销大

poll
本质上和select没有区别,它将用户传入的数组拷贝到内核空间,然后查询每个fd对应的设备状态
其没有最大连接数的限制,原因是它是基于链表来存储的
大量的fd的数组被整体复制于用户态和内核地址空间之间,而不管这样的复制是不是有意义
poll特点是“水平触发”,如果报告了fd后,没有被处理,那么下次poll时会再次报告该fd
边缘触发:只通知一次

epoll:在Linux 2.6内核中提出的select和poll的增强版本
支持水平触发LT和边缘触发ET,最大的特点在于边缘触发,它只告诉进程哪些fd刚刚变为就需态,并且只会通知一次
使用“事件”的就绪通知方式,通过epoll_ctl注册fd,一旦该fd就绪,内核就会采用类似callback的回调机制来激活该fd,epoll_wait便可以收到通知
优点:
没有最大并发连接的限制:能打开的FD的上限远大于1024(1G的内存能监听约10万个端口),具体查看/proc/sys/fs/file-max,此值和系统内存大小相关
效率提升:非轮询的方式,不会随着FD数目的增加而效率下降;只有活跃可用的FD才会调用callback函数,即epoll最大的优点就在于它只管理“活跃”的连接,而跟连接总数无关
内存拷贝,利用mmap(Memory Mapping)加速与内核空间的消息传递;即epoll使用mmap减少复制开销

零拷贝

传统Linux中 I/O 的问题
传统的 Linux 系统的标准 I/O 接口(read、write)是基于数据拷贝的,也就是数据都是 copy_to_user 或者 copy_from_user,这样做的好处是,通过中间缓存的机制,减少磁盘 I/O 的操作,但是坏处也很明显,大量数据的拷贝,用户态和内核态的频繁切换,会消耗大量的 CPU 资源,严重影响数据传输的性能,统计表明,在Linux协议栈中,数据包在内核态和用户态之间的拷贝所用的时间甚至占到了数据包整个处理流程时间的57.1%
什么是零拷贝
零拷贝就是上述问题的一个解决方案,通过尽量避免拷贝操作来缓解 CPU 的压力。零拷贝并没有真正做到“0”拷贝,它更多是一种思想,很多的零拷贝技术都是基于这个思想去做的优化

nginx介绍

特性:
模块化设计,较好的扩展性
高可靠性
支持热部署:不停机更新配置文件,升级版本,更换日志文件
低内存消耗:10000个keep-alive连接模式下的非活动连接,仅需2.5M内存
event-driven,aio,mmap,sendfile

基本功能:
静态资源的web服务器
http协议反向代理服务器
pop3/imap4协议反向代理服务器
FastCGI(LNMP),uWSGI(python)等协议
模块化(非DSO),如zip,SSL模块

nginx架构

 

web服务相关的功能:
虚拟主机(server)
支持 keep-alive 和管道连接( 共享TCP连接发起并发的HTTP请求)
访问日志(支持基于日志缓冲提高其性能)
url rewrite
路径别名
基于IP及用户的访问控制
支持速率限制及并发数限制
重新配置和在线升级而无须中断客户的工作进程
Memcached 的 GET 接口

nginx的程序架构:
master/worker结构
一个master进程:
负载加载和分析配置文件、管理worker进程、平滑升级
一个或多个worker进程
处理并响应用户请求
缓存相关的进程:
cache loader:载入缓存对象
cache manager:管理缓存对象

nginx高度模块化,但其模块早期不支持DSO机制;1.9.11版本支持动态装载和卸载
模块分类:
核心模块:core module
标准模块:
•HTTP 模块: ngx_http_*
HTTP Core modules 默认功能
HTTP Optional modules 需编译时指定
•Mail 模块 ngx_mail_*
•Stream 模块 ngx_stream_*
第三方模块

nginx模块

核心模块:是 Nginx 服务器正常运行 必不可少 的模块,提供 错误日志记录 、 配置文件解析 、 事件驱动机制 、 进程管理 等核心功能
标准HTTP模块:提供 HTTP 协议解析相关的功能,比如: 端口配置 、 网页编码设置 、 HTTP响应头设置 等等
可选HTTP模块:主要用于扩展标准的 HTTP 功能,让 Nginx 能处理一些特殊的服务,比如: Flash 多媒体传输 、解析 GeoIP 请求、 网络传输压缩 、 安全协议 SSL 支持等
邮件服务模块:主要用于支持 Nginx 的 邮件服务 ,包括对 POP3 协议、 IMAP 协议和 SMTP协议的支持
第三方模块:是为了扩展 Nginx 服务器应用,完成开发者自定义功能,比如: Json 支持、 Lua 支持等

nginx的功用

静态的web资源服务器
html,图片,js,css,txt等静态资源
结合FastCGI/uWSGI/SCGI等协议反向代理动态资源请求
http/https协议的反向代理
imap4/pop3协议的反向代理
tcp/udp协议的请求转发(反向代理)

 

Guess you like

Origin www.cnblogs.com/quguwei/p/11323296.html