Yin Jifeng: building high-performance Web applications using OpenResty

August 31, 2019, and shot a cloud joint OpenResty community, organized a national tour OpenResty × Open Talk Sharon Chengdu station, raw shellfish to find room Infrastructure Engineer Yin Jifeng made "use OpenResty building high-performance Web applications" in the event sharing.

OpenResty x Open Talk Salon national tour is sponsored by the OpenResty they shoot the cloud community, inviting the industry experienced technical experts OpenResty share OpenResty combat experience, enhance exchanges and learning OpenResty users, OpenResty promote the development of open source projects.

Yin Jifeng, former shells to find room Infrastructure Engineer, multi-language enthusiasts, tend to asynchronous and functional programming, and fond of prototype build, use OpenResty has built a WebBeacon in shells, image processing and other services.

The following is the full text Share:

Today OpenResty and introduce a relatively small minority of usage scenarios, using OpenResty do write Web services framework, we hope we can bring something new.

Why do high-performance service

First, high-performance Web services to a simple definition: QPS over a million service is a high-performance Web services. I believe that a good service is definitely not optimized out, decide on a service architecture benchmark, premature optimization is the root of all evil.

We all know that if you do Web, Web need only stretch horizontally extended on the line, so why do high performance? In fact, some services are not suitable for horizontal scaling, such as stateful service. I count the last few years have used the service, we found that the state does a lot of database services:

  • Unified centralized cache Redis
  • High performance queue Kafka
  • Traditional relational database MySQL, Postgres
  • Emerging non-relational database MongoDB, Elasticsearch
  • Embedded database, are embedded database, configure the service together with SQLite3, H2, Boltdb
  • The timing of the latest database InfluxDB, Prometheus, etc.

These services are actually above stateful, its expansion, stretching is not so simple, and is usually Sharding way to do manually retractable by manual operation.

While basic services and platform services throughout the company's service depends on a company's service may have hundreds or thousands of machines, but basic services should be only a small percentage, so we have a common demand for high-performance service-based users such as gateway gateway, Logging, Tracing and other monitoring systems, as well as company-level API, Session / Token check the performance of such services there are certain requirements.

Further, the horizontal expansion is limited, with the increase of the machine, the machine capacity provided, QPS is not a linear growth process.

High-performance benefits, I believe that the following areas:

  • Low cost, simple operation and maintenance: stretching capacity is not sensitive to the amount of a machine can carry multiple machines;
  • Facilitate the flow distribution: less machine can facilitate the flow distribution, rapid deployment means faster rollback; face completely incompatible, or for extremely smooth traffic migration to the needs of the situation will be used red, green and deployment;
  • Simplified Design: cache in the application would be more efficient, according to latitude machine can do some simple ABTest;
  • "Programmer's self-cultivation."

How do high-performance service

The vast majority of Web applications are actually IO-intensive services, rather than the CPU-intensive services. CPU performance, you may not have an intuitive feeling, here to give you an example: π is approximately equal to a second century is satisfied that that person is the scale of seconds in the time to observe the world. The CPU is the nanosecond scale. For the CPU, the human equivalent of 3.14 seconds a century of so long, one second is equivalent to the CPU "33 years." If it is 1-2 milliseconds, we feel fast, but it's actually a CPU for "about 20 days."

And that's just single core, in fact, there are multi-core blessing. The face of such a strong CPU performance, how full use of it? With this new programming model since 2000: asynchronous model, also called event-driven model. Asynchronous programming, event-driven, is the blocking of slow IO operations translates into faster CPU operation, used to drive through the entire application CPU is completely non-blocking event notification mechanism.

I since 2012 in phone contact with Sohu Python Tornado asynchronous programming, multiple languages ​​and frameworks used to do asynchronous programming. I think the synchronization model is the "thread pool + + frequent context switching data synchronization between threads of big lock", which means that if a synchronous model, system tuning to be done according to current loading, according to the current status of each, do it bit by bit, actually tuning the difficulty is very large.

An asynchronous concurrent model corresponds to a high state, because it is EventLoop, is constantly circulating. At the same time there are two requests over, it may also be triggered if one of the request did not make the operation of the CPU, it will affect another request. It is a potential delay in exchange for higher concurrency. High performance is equivalent to "Asynchronous + cache", that is asynchronous to solve IO-intensive, CPU-intensive cache solve the problem.

Currently on the market mainstream asynchronous languages ​​and frameworks include:

  • C and C ++, are generally not used to write Web applications;
  • PHP (swoole), this ecosystem is relatively not so good;
  • Java, in recent years means Spring Cloud, Gateway and fire, but still a relatively junior status, or in the kind of situation then / onError binders below to program;
  • NodeJS, talk asynchronous certainly can not do without JS and NodeJS, JS eco ecological asynchronous, it does not block synchronization. From design to now, after so many years of development, from the very beginning the callback, and gradually evolved into a blessing Promise library, then NodeJS the async / await, IO easy to use generate such an asynchronous state codes write, async / await models the industry is currently more recognized and respected write an asynchronous manner;
  • Python, Python experience to do from the beginning generate yield, the yield + send state made coroutine, to yield + from plus generate stream of processing to do, and now 3.x version of asyncio into the official repository of process ;
  • Rust, Rust is a new kind of language, which is based on a special memory model, choose the pull model to do asynchronous execution. It is time that is at 1.39.0 in September this year, async / await this model is a stable state, a async / await MVP of the smallest available products, and tokio as a de-facto official runtime is 3- stable for 6 months, so the first half of 2020 could see a wave with a wave of Rust to write Web programs;
  • Golang, Goroutine switching is to be done with, unlike other, it appears to be a synchronization code, is in fact made in the form of asynchronous runtime behind schedule.

So why learn OpenResty it? The first is defined here in 2015, the practice today to talk about doing in 2015-2016, possibly on a time not so new. But in 2015, in addition to these asynchronous JS, other maturity is not so high, so in that time frame OpenResty really good to do asynchronous.

What is OpenResty

Nginx

Nginx is superb reverse proxy server, it has a perfect asynchronous open source ecosystem, and the company has become the standard first-line Internet, a technology stack has been introduced. Therefore, on the basis of the reintroduction OpenResty something is very low risk thing, but bringing in a Module, you can reuse existing learning costs.

In addition, Nginx has a natural architecture:

  • Event Driving
  • Master-Worker Model
  • URL Router, do not have to choose routing library itself is very efficient in config
  • Processing Phases, with regard to the processing of requests that flow only after use in order to feel OpenResty

take

Lua is a small and flexible programming language, C and innate affinity, support Coroutine. In addition it has a very efficient LuaJIT implemented, provides a good performance, high FFI interfaces, according to this embodiment Benchmark C code faster than handwriting C modulation function.

OpenResty

OpenResty is Nginx + Lua (JIT), which is the perfect combination of both, with the ecological Nginx asynchronous driver Lua to. Lua itself is not an asynchronous ecology, Chun mention here why do OPM and package management, in fact, because luarocks ecological synchronized, so OpenResty lua-resty many packages are beginning to indicate that it is asynchronous, can in OpenResty use.

OpenResty application practice: WebBeacon

WebBeacon Overview

Beacon is a series of data points buried service, http request record, follow-up to do the data analysis, to calculate the number of sessions, duration of access, bounce rate.

dig.lianjia.com  is a front end WebBeacon Service, which is responsible for collecting data to support, is not responsible for the calculation and data processing. It is also responsible for receiving http requests return of 1 × 1 gif image; after receiving the http request, there will be an internal format known as UAL (Universal Access Logging), which is a unified global access log format, the http request related information pushed to the floor Kafka, you can do real-time or off-line statistics, this is the overview of the entire service.

When he took over we have a version, the first version is implemented in PHP, performance is not very good. It is FastCGI + PHP Logging to file a request is received directly, PHP written to a file, and then by rsyslog imfile, file input module to read the log file to push to a situation of Kafka by output module Kafka's.

For example, why PHP performance is not high? PHP at the request of the model, in order to prevent memory leaks, you do not need to be concerned about the release of resources. Open each request at the beginning of the log file and then written to the log, automatically shut down at the end of the request, as long as there is not initialized in the Extension will repeat the process. In this project, every time you open and close just to write a log in, which will make the performance becomes poor.

The challenge

  • A high degree of importance, for a long time, a large chain of home network data throughout the department of production is derived from this unique service, without this service, there is no output after all the calculations and statistical reports, downstream of the relying party, the service important quite high.
  • High throughput status, all PC stations, M station, Android, iOS, and even a small program within the network service can ever play on this service, so there is a requirement for very high throughput.
  • This service does business, it is going to take the information to different places, do some conversion, and finally landed, definitely you need to have a flexible programming things you can get it.
  • Resource isolation is poor, then the entire service chain of home network is a state of deployment of the hybrid, rely on the operating system of isolation in the maintenance, so the case of multiple services running on the same machine will appear on the CPU, memory, disk, etc. have state contention.

Adhere to the principle of

  • Performance Max, the higher the better performance;
  • Avoid disks (hard) dependence, because of mixed deployment, hope that it can even start to finish to avoid dependence on a disk on the critical path;
  • Immediate response to the user, a WebBeacon service, in fact, the core is off the data, the user can respond directly without waiting, the data can fall in the background do not need the card with the user's request. We want to have a mechanism to get request, a request is returned directly behind do heavy traffic portion;
  • Reconstruction costs as low as possible, the project has been the first edition, although a reconstruction, but the equivalent of a rewritten version. We hope that it takes as short as possible in the process, can inherit all the features in the original basis points, while its new features.

Main logic

The figure is a phase diagram of OpenResty, wrote OpenResty program will know it is a very important thing, is the core of the process OpenResty. Usually generatedby in Content? Stage will do with the upstream balancer, but actually doing Web application, you can write directly content_by_lua, because you directly to the output, while access and header two stages do collect some data, by this stage the data log_by_lua * Floor , to meet our immediate response to the needs of users. content_by_lua content_by_lua actually quite simple, it is no brain to spit fixed content:

As shown above, the statement Content-Length = 43, if you do not declare the default mode is Chunked in our scene is meaningless; \ z is Lua5.2 Multiline way to write a String, it will change the current line and back all whitespace characters interception off, then fight back.

access_by_lua

We did a few things in the process access_by_lua in:

The first is to resolve cookie, go to records and issued a number of cookie, we use cloudflare lua-resty-cookie package / to parse cookie.

Uuid then generates a request to identify the device ID or a random state ID series like this, we used C.RAND_bytes openssl randomly generates 16 bytes, 128 bit, then C.ngx_hex_dump conversion, then little by little cut uuid state. Because we use a lot of internal generated uuid, it is this hope that the higher the better its performance.

In addition there uuid ssid, ssid is the session ID, session number is a recording session. In WebBeacon, the user session is accessed continuously over 30 minutes. If a user continued access our services within 30 minutes, it is considered a session of the state; super cookie expired 30 minutes, will re-generate a new cookie, that is a new session. In addition, the statistics here are not cross-natural days, such as at night 11:40 minutes, a user in, would give him only a 20-minute cookie, so as to ensure no cross-natural days.

In addition to these, we have a most important business logic.

Mobile client browser requests are sent each time the cost, we do a set, the mobile terminal to collect log upload package together, point to a number of buried log summary report is triggered when the user presses the home button to exit or in the background process, in the form of POST to report, do the coding at the same time when the POST, plus GZIP, its flow loss will be low. That means we're going to resolve the body, to put up a reported split into the N POST request is not the same then buried the log landing point. To this end we have to do the following things:

  • Defining Max body, because they do not want to drop the disk, it sets the size of 64K. Considering that we have GZip, there may be state 400k-500k before, after collecting the summary should be enough to use;
  • Client encoding, decoding naturally going to end service. To the solution by zlib Gzip, and then decode URL, of json_decode do, finally split into a request of N, a list comprising the following condition N requests, each of the re-encoded list item points to a log and then drop back buried;
  • A table new, table.new (# list, 0) -> table.new (0,30) /table.clear, then do json_encode, ngx.escape_url other operations, a single log finally formed;
  • Assuming that appear in this process, any problems will be downgraded to do with log_escape to a raw body fell inside the log, so that the latter still has the ability to do it back.

**header(body)_filter_by_lua**

There is a logic to collect, process the summary field.

Why not access_by_lua time to do together? The reason is that we found that when the pressure measured in the case of concurrent high pressure, an action coredump occurs access_by_lua stage. There may be a way we do not, per se should not be used in access_by_lua period, this child does not get to the bottom.

So we put part of the filter stage with the current request is not so much the business moved to the process header_filter_by_lua, we solvability X-Forwarded-For to drop IP, solvability lianjia_token go down ucid, home ucid chain is a long string digital. Lua has 51 inside the digital precision, which means that this is no way digital fall, so we use a 64-bit FFI to new URL that number, and make a series of transformations, and then drop out of the way by intercepting print ucid length of more than 20 wanted.

log_by_lua options

Here the core i.e., a loop process log_by_lua log off.

  • access_log is Nginx native built-in approach, which does not apply to our scenario, because we have a TB level logging volume, and is a hybrid deployment, so the disk is the contention of the state; the most central point is that Read and Write is blocked operating system calls, which will make it a sharp decline in performance;
  • Zhang wrote doujiang24 / lua-resty-kafka is another way, the fact that we did not do research, because protocols and features kafka too much too complicated, this program and the subsequent development of the current characteristics may be unable to meet our needs.
  • We finally settled on cloudflare / lua-resty-looger-socket libraries, non-blocking can be remotely log off.

rsyslog

Log off tool we choose rsyslog, in fact, did not do too much technology selection, customization and optimization done directly on the original technology selection:

  • The maximum length of a single log is 8k, a single log long it will be truncated, rsyslog inside the specification, but in it can also do custom;
  • Internal messages transmitted rsyslog Queue, Queue inside with very heavy. We selected Disk-Assisted Memory, Memory ringing off the hook when it landed to disk do downgrade, ensure its reliability;
  • Parser not use the new version of the RFC, but use the RFC3164 for the old version, a protocol is very simple, as shown above, local msg = "<142>" .. timestamp .. "" .. topic .. ":". .msg is to standardize log in RFC3164;
  • We have a need for multiple log off the strip, between the log and the log cutting to do, here in fact opened the cutting parameters SupportOctetCountedFraming do, in fact, in the current message of the head plus the length of the message by a base RFC, plus additional features to complete the landing of the entire log;
  • With imptcp to establish unix socket file handle, do not use TCP, UDP, because unix socket losses will be lower. It is noteworthy that, cloudflare / lua-resty-looger-socket library does not support dgram unix socket, currently cloudflare has stopped maintenance of the library;
  • Output with the output module omkafka, opening two special operations: open dynaTopic can customize the input to the development of topic; open snappy compression can reduce the traffic flow;
  • Open two additional module: one is omfile, it received a request imptcp pass over, while pushing kafka floor to local, reason for this is that prior to the downstream push down procedures leading to kafka in the message has not been taken in a timely manner and cause loss of data, and therefore needs to be done to back up important data, coupled with the dependent disk; the other is impstats, it is doing the monitoring, output Queue capacity, the current situation and a series of consumer data.

Deployment Scenarios

We mentioned earlier the situation is mixed deployment, the line will have access criteria: tar + package run.sh. We precompiled OpenResty, all rely on statically compiled into it, when you publish the binaries pulled OpenResty local and Lua scripts mixed together, rsync to a fixed position on going again, or use supervisord do systemctl line daemon management.

Test environment is self-maintenance, this project can be divided into two, one is OpenResty, the other one is rsyslog. All the code you will probably fell OpenResty code repository, while ignoring the fact rsyslog configuration is also very important, so in fact it should put all project-related stuff fell on the code warehouse. We managed to do the rest of the environmental test everything by Ansible.

Performance data

FIG ultimate performance data are as follows:

In early 2018 I did a statistical, QPS peak probably 26000QPS, single peak pressure measured in 30000QPS, it is in fact a OpenResty machine can be withstood flow buried entire point. After the log transmission compression about 30M, daily log off up in about 1 billion. In order to ensure the reliability of the service, the server was online a triple EC2 C3.2xlarg.

to sum up

  • Overall, the architecture is the first one, we use OpenResty + rsyslog + Kafka three proven components to build a high-performance service;
  • To keep the border, hold the bottom line, such as disk does not fall, we should by all means do not drop the disk, there is the ultimate guarantee of performance;
  • Sometimes you look very high performance, but still have a lot of code to write pit, you can avoid many ways flame maps. Importantly, you need to consciously grasp of performance problems, start from little things bit by bit. For example NYI not yet implement a method to use to avoid; table.new/table clear example of known size disposable pre-allocated table array operation Avoid repeated many times the dynamic expansion; performance achieved very ngx.now high, it is cached, basically meet the requirements of accuracy of time; uuid generated, the conversion from a disposable libuuid generate 16-byte data; by reducing the CPU's shared dict such operation, etc., and finally we can construct a high-performance Web services.

These are all my share today, thank you!

PPT download and share live video:

Building high-performance Web applications using OpenResty

This article from the blog article multiple platforms OpenWrite release!

Guess you like

Origin www.cnblogs.com/upyun/p/11697193.html
Recommended