Dynamic routing scheme OpenResty of service

May 11, 2019, and shot a cloud joint OpenResty community, organized a national tour OpenResty × Open Talk Salon Wuhan station, they shoot the cloud chief preacher at the event made "based on OpenResty dynamic service routing scheme," the share.

OpenResty x Open Talk Salon national tour is sponsored by the OpenResty they shoot the cloud community, inviting the industry experienced technical experts OpenResty share OpenResty combat experience, enhance exchanges and learning OpenResty users, OpenResty promote the development of open source projects. Activity has in Shenzhen, Beijing, Wuhan held follow-up will continue to be held in Shanghai, Guangzhou, Hangzhou and other cities tour.

Shao Yang Hai, shoot Yun, chief preacher, operation and maintenance director, senior architect for system operation and maintenance, for many years the industry CDN architecture design, operation and maintenance development, team management experience, proficient in Linux systems and embedded systems, high-performance architecture design Internet , CDN acceleration, virtualization and KVM research OpenStack cloud platform, currently focused on container and private cloud virtualization technologies in practice they shoot the cloud.

The following is the full text Share:

Today and introduce a dynamic service routing solution based ngx_lua, it is the entire process of container components, container of great challenges in the service route, they shoot through the cloud to achieve their own programs, and has stabilized running about three years. Currently the program has been open source, if we follow have encountered the same problem, you can directly use this program.

Zero down-time update service

When Update Service, how we can let the service continue to fall it? Do you shoot the cloud service updates when it is not allowed to have failed, because if we fail to update request failed, even if the request is very small, also good on word of mouth, and if caused the accident, is to lose money. This is also an important reason for our dynamic service routing.

Service routing includes the following sections:

  • Registration service means a service provider in up to registered service discovery to the table explicitly mentioned services it provides, port, IP is how much, what is the name and other services;
  • Service discovery is the centralized management of local services, what services are recorded, which places them in;
  • Load balancing, because many of the same container provides the same services, need to consider how to do load balancing in these containers.

Service found many programs, but their application scenarios and language are not the same. Zookeeper is a relatively old open source project, is relatively mature, but the demand for resources is relatively high, is a program that we first use, including our current kafka, message queues are dependent on Zookeeper; etcd and Consul is a rising star, K8S is the dependence etcd, etcd container arrangement which is dependent; they shoot with a cloud service registration and discovery Consul in part, it is a one-stop station technology, deployment, visualization, maintenance and other aspects are more convenient, it not only supports KV storage, and native service monitoring, multiple data centers, DNS functions.

Load balancing, there are many solutions, LVS has an advantage in the finished front two, if the performance is not good can add a LVS, because it is four more ground floor, will not destroy the original network structure, but its extension very difficult. HA_PROXY and Nginx is different, the HTTP header analysis HA_PROXY consume less CPU, if done pure forwarding, may be used as WAF HA_PROXY, HA_PROXY probably account for about 10% CPU, and do Nginx head forward substantially pure accounted 20% CPU -25%, but Nginx more scalable, Nginx can do forward and load balancing TCP, UDP, HTTP three protocols, but HA_PROXY only supports TCP, HTTP. The biggest change is that it has been used lua HA_PROXY reconstruction, development will follow closely integrated with lua, which is equivalent to the addition of a capability, they also embrace K8S ecosystem. Our solution is to choose Nginx, because it focused on doing HTTP, scalability, and support for TCP.

As shown above, we Nginx and Consul in one diagram. To highlight the service, the service here with some less relevant are omitted. We based Mesos, Docker, Marathon made service management. There is a special service Registrator, it will play a container per physical machine by Docker API, by Docker API, timing of reporting the state of the container to Consul. Nginx above do load balancing, because our services are currently based on Nginx directly to the container inside.

How Consul in service updates to Nginx

In front of Ituri, Nginx to the container, the service registered to the configuration files are no problem, but the problem will arise from Consul to Nginx, as Consul has all the information, but the information on how to notify Nginx it? Up a new service, or a service hang, how to make the information Consul know Nginx these services in question deleted, and then write some new services added to the list, this is what we want to solve the problem.

The question here is how Consul in service updates to Nginx, if solved the problem, Nginx + Consul + Registrator model will complete the. Currently, there are many possible solutions to solve this problem:

1, Option One: Consul_template

Listening Consul in the key, trigger the execution of a script, use this feature of the service, the service changed, it will regenerate the configuration according to a pre-configured template, this is a final script to be executed.

image

The figure is an example template generation upstream.conf, it is among a number of variables to be rendered in the future, if K / v changed, generate a template of the real configuration file, and then perform a local command, Nginx -s reload, rebuild the configuration file, reload look, this new service will take effect.

Reload course also has some drawbacks:

  • First, there will be a loss in performance if frequently Reload;
  • Second, a long time in the process of shutting down the old state, if the connection is there long connection, the old process will always be in the middle of the process, this time is uncertain, you do not know in the end when Reload truly complete;
  • Third, in-process cache invalidation, some of the information we will database, some code into all of the local cache, so cache on all ineffective;
  • The most important point is inconsistent with the original intention of the design, it was originally designed to facilitate operation and maintenance does not affect the current request, to take the equivalent of a virtual machine with the same Docker do go awry, go awry after they are likely to encounter many strange pits, so at that time did not use this program.

2, Scheme II: Scheme internal NDS

DNS solution is relatively common, such as the IP address of a previously Server, now replaced by a domain name, as long as it resolves a number of IP out like this sound was perfect, and Consul itself supports DNS, we do not have to maintain an additional DNS, as long as this ID into the domain just fine.

But we feel the use of DNS programs might as well do Reload, because

  • First, a multi-layer DNS resolution time, adds additional processing time;
  • Second, DNS cache, which is the main reason, because there is no way to cache a table immediately cut off the machine in question, if you need to alleviate this problem, we will build a cache set too short point, but this number will resolve too much.
  • Third, the port number will change, physical machine configuration generally the same port, in Docker where you can do the same, but for some of the network is not very sensitive applications, such as some strong CPU applications, we will direct the container network connected by a bridge way, but this time the port is randomly assigned, may be assigned to each container is different, so is not feasible.

What we want is through HTTP interface to dynamically modify the list of services upstream of Nginx, we found a ready-made solution, called ngx_http_dyups_module.

3. Option III: ngx_http_dyups_module

ngx_http_dyups_module can interface to query by some of the current GET information; POST can be updated upstream; also through Delete Delete upstream.

The figure is an example that there are three requests:

  • After the first, the service port 8080 to send a request, do not have any back found upstream service, so it is a 502;
  • Second, by a two Curl the service request address to add to the mix;
  • Third, re-visit, the first instruction with the third instruction is exactly the same, because already the second service add to the mix, so this is a normal output.

In this process, there is no any operation Reload, nor change the configuration, it completed a function.

This module is very well written, but after we spent some time at it out, mainly because it is not bad, but we combine some of their own situation, found some problems:

  • The first, leading to rely on Nginx own load balancing algorithm. If our internal use Ngx_lua write more, after using this module, it will lead us very dependent on C module, which is itself some load balancing algorithm, we have our own unique needs, such as "local priority" priority access to the machine service, this sounds rather strange load balancing, if to do these things, we will change the C code;
  • Second, the secondary low development efficiency, and development efficiency is far from the Lua C;
  • Third, pure lua program can not be used, we do not have such a program a project can be used on the line, and preferably other items can be used.

Dynamic load balancing properties Slardar

For these reasons, we began to build their own wheels.

image

This wheel has four parts:

  • The first part is the most basic of Nginx, we hope to use some of the native instruction and retry strategy;
  • The second part is the lua module;
  • The third part is lua_resty_checkups, this is our version of lua management module to achieve the dynamic management of upstream, this module implements about 30% of the functionality, but there are some proactive health checks, it is probably the amount of code 1500 about the line, if the C module estimates there are at least 10,000 lines;
  • The fourth part is LuaSocket, it must not be used in processing the request in Nginx.

1, lua-resty-checkups

A brief introduction lua_resty_checkups this template, it has several functions:

  • First, dynamic upstream management, based on shared memory to achieve synchronization between the worker;
  • Second, passive health check, this is a Nginx own characteristics;
  • Third, the health check is active, this module will take the initiative to send heartbeat back-end package that can be timed for 15 seconds hair once, checking the back-end service is not alive. We can also have some personal examination, such as the timing of the upstream transmission heratbeat detect heartbeat packet service is alive;
  • Fourth, is the load balancing algorithm, can save a local priority network traffic.

2, service differentiation

image

In DiffServ Host: FIG example on the same address to the two curl to hair, is not the same between the two.

3, request process

Brief process under request, which can be divided into three parts, the top is receiving a request, a worker loads the code, the code executing the worker host according to find the corresponding list, then proxy the request to the server.

4, dynamic upstream update

This dyups with the C module, just as dynamically updated list of upstream via HTTP interface, plus you can see just added to the list after two services in the management page, there will be server address here, some of the health check messages, status changes time, and the number of times it fails, a record figure is a proactive health check.

image

Why proactive health check it? I usually use is that some passive health check request is issued after the failure to go before we know fails, take the initiative to check the heartbeat packet is sent, before the request can know the service is not a problem.

5, dynamic load lua

Dynamic load lua do when the game will be frequently used. Start program which ran some lua code to do the back-end of the program and do the conversion parameters compatible with such a small adjustment is not willing to change, so they took the previous route to do, first of all you can do to rewrite the request, because I can to get the entire request, the request body it can do any thing.

In addition, we can also control combined with some authority, do some simple parameters to check. According to our statistics, we have at least 10% repeat request, if these repeated requests to perform is unnecessary consumption, we will be back to 304, represents the same results with previous, previous results can be directly used. While returning 304, if we need to determine the back-end services, the entire request will close down, and then go back hair, the equivalent bandwidth of the network to increase the number, this is already saves bandwidth, it can not be made to the back .

image

This is an example of a dynamic load to load, if the code Slardar pushed to the inside, it will perform, if a delete operation, it will return 403, which can ban immediately out this operation by this code, then what function it? You can imagine the functions can be done, and this process is dynamic, if the code is loaded, you can see its information from the status page.

Dynamic load balancing to achieve Slardar

Slardar characteristics are described earlier, followed by a brief implementation process, a total of three parts: dynamic upstream management, load balancing, and dynamic code loading lua.

1, dynamic upstream management

Load startup configuration file by luasocket from the consul, service if there is no reason to hang up, then hang up when you just, how do you know just how out? So you have to have a way to cure these things, and we chose the consul, so it must be loaded from the start when consul, after starting to listen on port management, receiving upstream update instruction, but also to start a timer, the timer synchronization between worker to do, look at the timing of the shared memory has not been updated, updates can be synchronized in their own worker inside.

This is a simple flow chart, from the very beginning to load consul, after completing the fork to the worker process is initialized just have a load of those worker, another part of the timer is started, once the update will enter into this inside.

2, load balancing

Our main use of the load balancing balance_by_lua_ , a request came through the C module upstream of the request to send here, as is the configuration file, just have a similar, is here to write the address. By balance_by_lua_ instructions, we will block it into this file, you can use the code to choose a lua lua in this document, this is a self-checkups selection process.

The figure is a rough process, you can look at the lower part, the beginning of time, checkups.select_peer our module, then according to the current peer host and then jumped out to go, which achieved a lua control. The upper part is to know that it is success or failure, if it fails to state on this feedback.

3, dynamic load lua

This is mainly used lua three functions, namely loadfile, loadstring and setfenv. loadfile is loaded lua local codes, loadString from the HTTP request body consul or load code, setfenv code execution environment provided by these three functions can be loaded, the specific details of the practice will not be described here.

4, dynamic load balancing Slardar advantage

That's why we made the wheel, mainly used lua-resty-checkups modules and balance_by_lua_ *, it has the following advantages:

  • Lua achieve pure, does not rely on third-party C module, so the secondary development of highly efficient, reducing the maintenance burden;
  • You can use Nginx native proxy_ , because we are only at the stage of the election peer requests made after peer election finished, the stage of development of data that go directly Nginx own instructions, so it can be used Nginx native proxy_ instruction;
  • It is suitable for almost any ngx_lua project, which can meet the pure lua programs and C programs.

In the micro-architecture in the service, Slardar what to do

We are also put some of the services previously transformed into micro-service mode. Micro service is actually derived from a larger service, split it into smaller services, it's not the same expansion with the migration, expansion of micro-expansion of services can only part of it, according to how much expansion needs.

We are now trying to a program, this program is that we have to do a background map needs to do a lot of drawing this function, such as landscaping, thumbnail, watermark, if you want to make plans to optimize the service is very difficult, because it features too much, if we break it into micro service is not the same, such as the dotted line above is our current services, this is a micro gateway services, here are some small service. For example, landscaping, its operation is more complex, more CPU consumption, we will definitely choose a better machine some CPU; GPU to do with the thumbnail, this performance can be increased several times; and finally do figure a law-abiding, it Some common enough.

There are some more vice, such gradients, as long as possible to ensure that services can be used on the line, routing service through the micro, we differentiate according to the back of a service before, and its parameters split into three smaller service , this can make a complete service by three steps of FIG.

Of course, we in fact try this program, there are many problems, such as a service had to use a program can do, now turned into three, the bandwidth of the network is bound to be increased, in the middle of the picture is to be the guide to turn to, this how to do it? We now think the way is to do some local priority scheduling policy, then that is done, there are a number of local watermark, it is a priority with the local.

Finally, apply the master's words: Talk IS Cheap, at The Show Me code . Currently we have Sladar open source project, the project addresses are: github.com/upyun/slard...  .

Video and speech PPT:

Dynamic routing scheme OpenResty of service

Guess you like

Origin juejin.im/post/5ceca6186fb9a07eeb13886b