Open source components are paired with Docker, MESOS, and MARATHON, so don't be too good | Take a cloud enterprise container private cloud architecture

Share | Mo Hongbo (also a cloud system engineer)

finishing | Pinellia

On December 2, 2016, the Global Architect Summit 2016 hosted by InfoQ was successfully held at the Beijing International Convention Center. With the theme of "Cloud Platform Architecture Design and Container Landing Practice", this event invites technical experts in the field of private cloud architecture to discuss the technical difficulties in container technology and how to solve these problems based on their own experience.

You Paiyun was also invited. Another cloud system engineer, Mo Hongbo , shared several components of the cloud architecture with you around "You Paiyun Enterprise Container Private Cloud Architecture" to help you understand how to quickly build an enterprise private cloud, etc. Some questions worth thinking about.

Because the article is relatively long, Xiaopai first listed the article structure here~

content

⊙The practical problem of server expansion ⊙Disassemble and shoot private cloud
  • Continuous delivery of images
  • Dynamic service routing
  • Service log collection
  • Service monitoring alarm

The following is the sharing of Mo Hongbo~

Practical problems of server expansion

Hello everyone, what I am sharing today is the enterprise container private cloud that Youpaiyun is using now. Before talking about the private cloud architecture , I would like to tell a story first. In 2015, Paiyun encountered a situation where the back-end image processing requests doubled several times. At that time, the redundancy of the entire image processing cluster was about 20%, which was fundamental For those who cannot resist such a large concurrency, the only way is to expand. During the entire expansion process, we found several pain points:

First, new machines are needed

Although there are many machines in the data center, it is impossible to deploy the server to a completely new environment when an online failure occurs, and our online business is also run in the new environment, which may lead to the emergence of the business currently being served. The failure leads to bigger accidents, so we choose new machines, but everyone knows that the cycle of new machines is very long.

Second, the system environment problem

With the machine, deploy the program, and start running the test cases of the project. It turns out that some jpeg image test cases have not been run. Then we began to troubleshoot the problem, compared the differences between the two hosts, and finally found that it was caused by a libjpeg version difference. Yes, although the problem was finally located, it took a long time.

From this incident, we found that several problems need to be solved urgently:

  1. How to quickly expand capacity to deal with burst traffic;

  2. How to solve the deployment environment problem;

  3. How to integrate discrete computing resources.

Decomposing and Shooting Private Cloud

In 2015, container technology was very popular, and many manufacturers were building their own private clouds and provided many solutions. At that time, we wanted to shoot the cloud again and also needed a private cloud.

What are the characteristics of an ideal private cloud? I've listed a few points here:

  • Unified management of resources to manage the computing resources of the entire data center;

  • Support resource addition, deletion and modification, the machine is often taken off the shelf;

  • Service unified entry, what service is needed, only need to access the outermost web server, the requester does not need to know the internal logic;

  • Continuous integration delivery, the advantage of container technology is that it can be more automated by subverting traditional delivery methods;

  • Deploying and migrating apps should be convenient enough, as is service scaling;

  • Environment isolation is required between apps, and this container technology has already helped us get it done;

  • General log collection, no need to manage in the code, and the previous code should be changed as little as possible;

  • General monitoring and alarm mechanism, monitoring is the top priority of private cloud platforms, especially with new technologies such as containers, the stability needs to be considered.

In the scheme selection, we chose Docker , MESOS , MARATHON . Many manufacturers are doing this solution. We have done some comparisons with Google k8s. We believe that the initial learning cost of this solution is relatively low, and it can be used quickly. The structure of MESOS is clearer, and there are many rich APIs that can be called.

Next, let's take a look at the infrastructure of MESOS .

 MESOS infrastructure

First of all, Mesos is a Master/Agent architecture. The Master is responsible for the unified management of resources and the distribution of tasks, and the Agent is responsible for starting and stopping the executor, reporting information such as host resources and executor status. Under normal circumstances, more than 3 Masters will be started to ensure high availability, and the status of the Masters is maintained by Zookeeper. Framework is a scheduling framework on Mesos, and Marathon Hadoop Chonous is a relatively common task scheduling framework. The whole structure gives the impression of clarity and clarity.

What is the internal mechanism of MESOS?

 △ MESOS internal mechanism

First, a Mesos-Agent will be deployed on each machine, and the Agent will report the information to the Master. The scheduler scheduler requests resources from Mesos-Master, Mesos-Master feeds back all available resources to the Scheduler, and the Scheduler decides which one to deploy to according to its own rules. This is roughly the process.

Now that we have Docker for app environment isolation, Mesos for unified management of computing resources, and Marathon to ensure that our services will never be disconnected, what else do we need to do?

The first is continuous delivery of images . Before I talk about it, let me talk about what continuous integration is.

 △ Docker continuous delivery

The above is a set of processes we use. We submit the code to the code repository. We use Gitlab. Gitlab triggers the build task to the CI server. The CI server builds the project, runs the test, generates the results, and feeds back the results. This method can greatly reduce the occurrence of Master branches. possibility of error.

So how to do Docker's continuous delivery? We made a few changes to this model.

 △ Another shot of cloud Docker continuous delivery

When a project needs to be released, the person in charge of the project puts a tag on the Master branch of the project, such as v0.1.0, GitLab triggers the build task, and the CI server determines that it is a Docker build task according to the trigger method, and executes Docker build.

Then push the Docker image to the company's private cloud repository, and finally generate a result.

I also used the GitLab CI solution to shoot the cloud.

Let's look at a simple example. The basic meaning is that when there is a tag submission on the branch, the Docker image build task is triggered and the image is pushed to the private image repository.

The following is another early version of Paiyun's private cloud.

△ Early versions of private clouds

First, there is a Docker repository and a cluster. Each machine in the cluster is deployed with Docker Mesos Agent, and some points are selected to deploy Mesos Master and Marathon. Under such an architecture, some simple apps can already be run, such as Paiyun. Asynchronous audio and video processing used. The basic logic is to take tasks from the audio and video processing task queue for consumption. If the task processing times out, it will be put back into the queue, but the Web server under this architecture cannot run temporarily. For example, if a physical host suddenly loses power, then according to the logic of MARATHON , the service running on it will be switched to another available machine. In this case, the address will change and the previous service will not be available. .

The second is dynamic service routing

There are two solutions, one is HAproxy and the other is NginX. The open source Marathon-lb of Mesosphere is based on HAproxy. We chose NginX, which is simpler than HAproxy. We implemented a project called Slardar, which is implemented by NginX and Lua, supports dynamic updates, and does not require reload operations.

 

△ Host differentiated services

The above is an example of requesting Slardar, which differentiates services through Host. For example, in the above request, the Host is imageinfo, and this request will be routed to the Info cluster. The requester receives the result is the information of this picture, and the following is routed to imgprocess. The return is a thumbnail.

How to dynamically update Slardar upstream.

 △ Dynamic update upstream

This example needs to add a service upstream list and send all configuration, including host, to Slardar. The example is a 5.108 machine and a 5.109 machine. After sending, you can see its status through the Slardar status page. We found that the status of 5.108 is OK and can be accessed normally. The status of 5.109 is err. The one is lua-resty-checkups, which can help us check the availability of back-end services. We have also made this module open source.

△ Unified service routing Slardar

With this layer of dynamic service routing, you can run some WebServers in the private cloud.

Next, let's talk about service log collection

Log collection is a very troublesome thing, but the log is very critical when analyzing the problem. I also encountered the following problems in the log collection of the private cloud:

  • Large amount of logs and high frequency

  • Distributed in multiple Docker containers

  • often migrate

  • Split by service

  • multiple uses

Here is an introduction to Heka used in the private cloud. Heka, Mozilla's open source project, and the logs of CDN nodes are collected by Heka. Heka can be understood as a pipeline, data input/data output. There will be some logic inside the pipeline, such as adding filtering rules (filtering out unnecessary logs), adding routing rules, and routing logs to different outputs.

△ Heka

 △ Heka's DockerLoginput plugin

There is one thing to note about using Heka's DockerLoginput plug-in. The log needs to be output to standard output and standard error. Let's take a look at what Heka-Agent is. Its implementation is actually very simple. It interacts with Docker Deamon through DockerAPI, and takes out logs and sends them to the ES cluster. The log message format roughly includes the following fields: Uuid, Timestamp, Type, Hostname, Payload, Logger, Fields. The Fields field can be customized. Students who have used MARATHON should know that MARATHON will have a MARATHON_APP_ID environment variable when it starts the container. We map it out when configuring heka, and read this variable through the fields field, so that The division by service mentioned earlier can be achieved.

 △ Model of log collection

This is a simple diagram of log collection. Each machine in the Mesos cluster will deploy a Heka-Agent, which can use Mesos to deliver apps. After Heka-Agent collects logs, it is simply assembled and sent to Kafka. The log data of kafka can be provided to multiple parties, connected to our log platform, billed according to the log, connected to elasticsearch, and data analysis.

In the entire private cloud, there is an additional component called log collection, which is also hosted by Marathon.

Now let's talk about service monitoring alerts .

 △ Docker monitoring and alarm platform

Why do service monitoring alerts? Because in the early days of container private cloud, we were not very confident about Docker. We were worried about whether there would be problems with running our services in Docker, and it would be unreliable without monitoring the private cloud.

There are many open source solutions for Docker container monitoring, Google's cAdvisor is used more, and the installation is also very convenient, and the page is also very cool. However, the information it summarizes is based on ContainerID. This monitoring is too small and not intuitive enough to correspond to specific services. It will be troublesome to troubleshoot problems, and it is not easy to set alarm thresholds. The alarm thresholds of CPU and memory are all due to Services vary, so we started our own DIY Docker monitoring and alarm platform.

 

△ DIY Docker monitoring and alarm platform model

This is a model for monitoring alarms, which is very similar to log collection, because the principle is also similar, that is, the data of all local Docker containers is collected.

It deploys ES agent on each machine to obtain running container monitoring data, including CPU, memory, etc., and sends the collected data to ES.

We can take a look at how the ES agent is implemented.

 

Code snippet for implementing ES agent

This is a piece of code in our data collection plug-in. The logic is very simple. A dictionary is constructed, which includes some necessary monitoring data, such as cpu, memory, disk IO, network, etc. There is a field rssPercent, which means the proportion of resident memory in the total memory of the Docker container. Why not use the field memPercent, because in the actual use process, we found that Docker memoryStats.Usage contains the cache part, which is inaccurate and will cause false positives.

The penultimate field, appID, is actually the name of the service. This field is used for data merging, because it is generally impossible for the same service to have only one container. When multiple containers are created, they are merged according to the service name. It is very critical. If there is an abnormal situation, it can be used to analyze whether it is a special case of the container or all are the same.

Back to the ES Agent model. The front-end data shows that we are using kibana3. What are the physical machines and what is the memory curve? Through kibana, it is convenient to troubleshoot some problems. For example, if there is a problem with a certain one, I need to locate what is going on, whether there is a problem with the server, or the service itself. Based on this, we wrote an alert program Alertman, which automatically analyzes 5 minutes of data.

△ Alarm program Alertman

We have made a component for monitoring alarms. Like log collection, it is also hosted on MARATHON and issued by Mesos.

△ Monitoring and alarm components

What's wrong with this architecture? We found that the whole architecture is very fragmented: including dynamic service routing Slardar, MARATHON.

For example, to expand the capacity of the web server, you first need to go to MARATHON to scale up the service to 10, and then synchronize the list to Slardar and send an http request. The whole process is very troublesome, and it also includes some basic information. I couldn't see it either, so I made a UPONE. UPONE is the API layer of the private cloud platform.

 

△ UPONE

This is the configuration of UPONE, mainly refer to some necessary attributes in the MARATHON configuration, group is the service group, the following is the name, and some can be configured below, the ports field is the interface for internal monitoring, and some environment variables can be set below, etc.

When we create a new App, first upone init will generate a default configuration, you can modify this configuration and execute upone deploy. If you are a WebServer also execute upone app name sync to update the service list to Slardar.

 

△ New App

Elastic expansion is very simple, just need to directly issue a command:

△ Elastic expansion

App updates, in general, do not need to be changed except for the docker image.

 △ App update

 

Private cloud architecture

The above figure is the entire private cloud architecture. But we can do more. Now there is a lot of data that can be analyzed. For example, Nginx logs are Slardar logs, ES data, and the queue length of audio and video processing. You can do the following things through data analysis:

  • Automatic expansion

  • Abnormal node removal

  • Integration of resources

To sum up, our entire private cloud platform uses a lot of open source components. Docker/Mesos/Marathon is our basic framework. GitLab-CI is used for continuous delivery of Docker images. Nginx+Lua implements dynamic service routing Slardar and uses Heka/ Kafka is used for log collection, ElaticSearch/Kibana is used for data analysis and presentation, and Python/Go/Slack is used as a monitoring and alarm platform. Fully embrace open source and add some personalized features on the basis of open source, this is our private cloud.

thank you all!

The following is the QA part

Q1: Hello, I have benefited a lot from your speech. What is the current network model of Paiyun? How do you communicate across hosts? Is there any communication?

A1: Using Docker's native network, no other modifications have been made. Because the services hosted by Youpaiyun are mainly CPU-intensive, IO has not considered it yet, and may use the CNI component provided by Mesos in the future.

Q2: Hello, I have a question to ask. Regarding the tool alertman you developed yourself, I would like to ask why you developed it yourself? As far as I understand the integration of ES, there are some third-party monitoring tools provided, why did you develop it yourself? You developed this thing yourself, can you give some more details? How does it integrate with ES? To monitor, I must query data. How did you make it easy to query? Can you briefly describe the details of the design?

A2: We mainly use it as an alarm, and the query is queried through ElasticSearch. What it does is ES 5 minutes of data. Why do it? Because we have a lot of our own needs, such as 5XX, the back-end service will spit out some status codes such as 504 and 502. We will monitor these status codes, and then we will make a logic. It is still in the testing stage, and we will analyze all the post-processing. If the end node finds that there are too many nodes 502, the node will be removed immediately.

Q3: Your monitoring is based on Slardar statistics.

A3: Yes, we prefer to remove abnormal nodes at the business level.

Q4: Hello, I would like to ask a question about pre-collection. Programmers generally pay more attention to locating problems through logs, and then we monitor the health and stability of the entire system through logs such as time-consuming and CPU. Your logs are put into one and then extracted, and then analyzed before going to different ones. Local delivery, or has it been dispersed into different logs when it was originally called?

A4: We directly output the log to the standard output, and view its log directly through the Docker API. The logs will be aggregated to kafka, and a demand like yours needs to write some more consumers to consume. This is what our architecture does now.

Q5: Hello, what I want to ask is log collection. If your application is relatively large, there will be many log files generated. How is the log sequence guaranteed?

A5: We aggregate logs according to hostname. Logs in the same container can guarantee the order of logs.

Accelerator Note: Unless otherwise stated, the articles are all  edited and edited by the Accelerator   . Please indicate the source for reprinting.

Please pay attention to our official WeChat account: jiasuhuihao to get the most fashionable and cutting-edge Internet information.

http://www.tuicool.com/articles/Y7nAF3F

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326507437&siteId=291194637