The internal service containerization process of AI unicorn SenseTime

This article is compiled from the technical sharing done by Allman, the operation and maintenance engineer of SenseTime in the Rancher WeChat group on the evening of April 26. SenseTime is an AI company focusing on the field of computer vision. This sharing combines the container platform team to help the company's business/internal service containerization process, and introduces the tools used by SenseTime in the containerization process, the best practices and lessons worthy of sharing.

Search WeChat account RancherLabsChina, or scan the code at the end of the article, add Rancher assistant as a friend, you can join the official technical exchange group, and participate in the next sharing in real time~

Content directory

● Background

● Demand analysis and technology selection

● Container images

● Monitoring alarm

● Reliability guarantee

● Summary

background

SenseTime is an AI startup in the field of computer vision. Some businesses in the company require cloud API support, and some customers will also call these so-called SaaS services through the public network. Generally speaking, the architecture of cloud API is relatively simple. In addition, since the company was established soon, the historical burden is much lighter. Many businesses have a microservice-like architecture at the beginning of the design, which is more suitable for containerization to adapt to the more complicated deployment. question.

Each business line of the company is relatively independent. Organizationally, it is reflected in the differences in personnel, performance and reporting relationships; technically, it is reflected in the independent evolution of programming languages, frameworks and technical architectures, while the deployment of services and subsequent maintenance work are It is assigned to the operation and maintenance department. The operation and maintenance complexity increased by this independence and difference needs to be converged.

The problems we encountered are not new, and the industry also has many tools and methodologies to deal with it, but in the early days, we still maintained a certain restraint on the complexity of the operation and maintenance tools: ssh + bash script survived the early period Over time, ansible has been applied for several months, but forced by reality, we finally turned to Docker.

Docker is revolutionary, and its clean UX has captured the hearts of technologists. We were in a period when the war of container orchestration was in the stage of Docker Swarm mode release, and we needed to find the kind of tool that could handle both The growing complexity of operation and maintenance can also free operation and maintenance engineers from monotonous, repetitive, and stressful releases.

Rancher is what we saw in the comments on HackerNews. Its simplicity and ease of use allow us to see the dawn of deploying containerized applications in the production environment. Much work to be done. Due to space reasons, a detailed description is unrealistic. Next, I will first introduce our demand analysis and technology selection at that time, and then talk about several important components such as container images, monitoring alarms, and reliability assurance.

Enter image description

Demand analysis and technology selection

Putting aside the buzzwords of containers/container orchestration/microservices for the time being, for our current situation, this new O&M tool needed three features to be successful: development-friendly, controllable operations, and easy O&M.

development friendly

It is possible to push the application packaging work to development to eliminate the work of packaging/compiling code such as java/ruby/python by yourself, but it is also necessary to ensure that the package produced by the development can at least run in the production environment, so how can developers be allowed to The key is to easily and correctly type the release package, and the latter can automatically flow to the production environment. To make a long story short, we adopt the Docker + Harbor method. Developers build container images, push them to the company's internal Harbor-based container image station through LDAP authentication, and then automatically synchronize the internal images to the production environment image through Harbor's replication mechanism. For the specific implementation, please refer to the following section on container images.

Controllable operation

It can allow developers to participate in the work of service release. Due to the different business scenarios/technical stacks/architectures of business lines, it is difficult for only operation and maintenance personnel to solve code-related problems during release. Therefore, it is necessary to allow developers to People participate in the daily publishing of services in a controlled situation, and this requires providing some restricted, auditable and easy-to-use interfaces. WebUI+Webhook is a more flexible solution. In this regard, the functions provided by Rancher meet the needs.

Easy operation and maintenance

To be honest, the complexity of operation and maintenance is the core of our concern. After all, containerization is initiated by the operation and maintenance department to adapt to the increasing complexity, and the butt determines the head. Considering the problems of the black box and poor stability of the container itself, coupled with the fact that there are very few people who really understand the container technology, the containerized operation and maintenance that can be implemented smoothly is reflected in our three requirements: multi-tenant support , stable and can know when something goes wrong, and the cost of failover is low. Multi-tenancy is necessary to support multiple parallel business lines; there are too many problems with containers, and the online environment limits the Docker and kernel versions of each machine in the form of operating system images; because traditional monitoring and alarm tools are stretched in a containerized environment, A complete set of new monitoring and alarming solutions is needed; no one can debug all container problems on the spot (such as cross-host container network failure/mount point leaks/dockerd stuck/basic component containers cannot be started), and blue-green deployment is required. Immediate switchover after failure, maintenance reliability and controllability are critical to a new system.

Technical Architecture Diagram

To sum up, the combination of open source systems based on Rancher, Harbor, Prometheus/Alertmanager can basically meet most of the needs of container management. The overall architecture is as follows

Enter image description

container image

The container image service is a company-level IT infrastructure. Under the physical constraints of limited interconnection bandwidth in each office area, it is necessary to provide users scattered in multiple geographical locations with a consistent, convenient and fast experience. We mainly use Vmware's open source Harbor tool to build a container image service. Although Harbor solves problems such as authentication and synchronization, Harbor is not a silver bullet for this problem. Some work needs to be done to make the image service have a better user experience. . We take Google Container Registry as an example to demonstrate this experience.

As Google's open container image service, users around the world will use the same domain name gcr.io to push and pull images. Distributed by GeoDNS on different Google data centers, there are high-speed network connections between these data centers, and various applications including GCR will synchronize data through the network. This method not only gives users a consistent experience, that is, everyone pushes and pulls mirrors through the domain name of gcr.io, and because everyone interacts with the data center close to their own location, it will not be too "stuck", and because The underlying storage of Google Container Registry is constantly synchronizing images across data centers at high speed (thanks to Google's excellent IT infrastructure), and others in foreign countries can feel that the images we push can be pulled quickly (image "push" and "pull" " is a prerequisite).

The purpose of introducing Google Container Registry at length is that user experience is critical to user acceptance, and the latter is often the key to the survival of a new service, that is, to provide a GCR-like experience within the company, it is our container image service to successfully land And want to get close to the product look and feel. In order to achieve this look and feel, two core functions need to be introduced, automatic synchronization of development/production mirrors, and mirror synchronization across office areas. In addition, although it is a bit beyond the mirror service itself, due to the special national conditions and usage correlation, the slow pull of foreign mirrors (DockerHub, GCR, Quay) is also a key link that affects the experience of container mirroring services, and mirroring acceleration services are also required.

Development/production mirrors are automatically synced

Due to the differences in the security and usage scenarios of the development environment (company private network) and the production environment (public network), we have deployed two sets of mirroring services. The internal network is based on LDAP authentication for the convenience of developers, while the public network is based on LDAP authentication. Various security measures are in place to restrict access. But the problem this brings is how to easily transfer the image to the production environment, that is, the image created by the developer on the intranet needs to be automatically synchronized to the production environment.

We take advantage of the replication function of Harbor, and manually enable replication only for projects that are required in the production environment. In this way, only the configuration at the time of the initial launch is required, and the mirror push of subsequent development will be automatically synchronized from the intranet Harbor to the public network. On Harbor, no manual operation is required.

Mirror synchronization across offices

Since the company has offices in multiple places, members of the same team will also have geographical distribution. In order for them to collaborate and develop easily, the mirror needs to be synchronized across regions. We rely on the company's existing swift storage. There is not much to say about this. The greater the bandwidth, the faster the synchronization speed. It is worth mentioning that since Harbor's UI needs to extract data from MySQL, if you need to see the same interface everywhere, you need to synchronize Harbor MySQL data.

Mirror acceleration

Many open source images are hosted on DockerHub, Google Container Registry and Quay. Due to the constraints of GFW and the company's network bandwidth, these images are directly pulled, which is as fast as a turtle and greatly affects work mood and efficiency.

A feasible solution is to download these images through a proxy, upload them to the company's mirror site after docker tag, and then change the corresponding manifest yaml, but the user experience of this solution is like the thunder-struck encounter in Final Fantasy, and ordinary users do not. I know why the application can't start, even if I know it's because the mirror image is slow to pull, sometimes the image can be pulled and sometimes it can't be pulled. His machine can pull it, but mine can't. I have to figure out where to configure the default image address, and I have to Trying to find a way to pull the mirror back from abroad and upload it to the company is a cumbersome and time-consuming process. It is a waste of life to waste time on this kind of thing.

The solution we adopted is to use the domain name of mirror.example.com to mirror DockerHub, and at the same time the company nameserver hijacks quay and gcr, so that users only need to configure docker daemon once to painlessly pull all commonly used images, and they don’t have to worry about where they need it. Override pulls the image location, and each office area is similarly deployed, so that users pull the image locally in the office area, which is fast and saves valuable office area bandwidth.

It is worth mentioning that due to the hijacking of domain names such as gcr.io in the office area network, but we definitely do not have the keys of these domain names, we must use http to pull the mirror, so we need to configure the docker daemon -- insecure-registry this item

user experience

Configure docker daemon (take Ubuntu 16.04 as an example)

sudo -s
cat << EOF > /etc/docker/daemon.json
{
  "insecure-registries": ["quay.io", "gcr.io","k8s.gcr.io],
  "registry-mirrors": ["https://mirror.example.com"]
}
EOF
systemctl restart docker.service

test

# 测试解析,应解析到一个内网IP地址(private IP address)
# 拉取dockerhub镜像
docker pull ubuntu:xenial
# 拉取google镜像
docker pull gcr.io/google_containers/kube-apiserver:v1.10.0
# 拉取quay镜像
docker pull quay.io/coreos/etcd:v3.2
# minikube
minikube start --insecure-registry gcr.io,quay.io,k8s.gcr.io --registry-mirror https://mirror.example.com

Technical Architecture Diagram

Enter image description

Monitoring alarm

Due to the shortage of traditional monitoring and alarm tools such as zabbix in the containerized environment, we need to re-establish a monitoring and alarm system. Fortunately, prometheus/alertmanager is relatively convenient to use, and the existing zabbix is ​​not used properly, resulting in the user experience of the existing monitoring system. Very bad (false alarm/missing alarm/alarm storm/non-standard naming/complex operation, etc.), otherwise, under limited time and personnel conditions, it is still very troublesome to start everything from scratch just to kick start.

In fact, the monitoring and alarm system of the distributed system, whether using containers or not, needs to solve these problems: it can perceive the indicators at the machine/container (process)/application/three levels, and the logs scattered on each machine must be collected as soon as possible for Query retrieval and alarm low signal-to-noise ratio, no false positives and no false negatives, and can "see the meaning of the text", etc.

As mentioned above, prometheus/alertmanager has already solved these problems better: through the exporter pattern, the plug-in solution can be flexibly adapted to different monitoring targets (node-exporter, cAdvisor, mysql-exporter, elasticsearch-exporter, etc. ); using the cooperation of prometheus and rancher dns service, the newly added exporter/agent can be dynamically discovered; alertmanager is an excellent alarm tool, which can realize routing/aggregation/regular matching of alerts, and cooperate with existing emails and ourselves The added WeChat (now officially supported)/telephone (integrated with Alibaba Cloud voice service), the number and frequency of daily alarms have reached an acceptable state for oncall personnel.

As for log collection, we still follow the recommendations of the community, using the combination of Elasticsearch + fluentd + Kibana, fluentd as Rancher's Global Serivce (corresponding to Kubernetes' daemon set), to collect the system logs of each machine, dockerd logs, through docker_metadata This plug-in collects the logs of the standard output of the container (log_driver: json_file), and the logs of the rancher basic service, which are compressed and archived by the local file system and sent to the corresponding elasticsearch service (not started in the container mode) in time, and are visualized by Kibana for after-sales products. use. The log-based alerting uses Yelp's open source elastalert tool.

It is quite tedious to manually create a monitoring alarm stack for each environment, so we also customized a Rancher Catalog to facilitate deployment.

There are too many aspects involved in the monitoring and alarm system, and what is a "good" monitoring and alarm system is not a topic I can explain here. Google's Site Reliability Engineering book has what I think is a better explanation, but a The point of view that can be shared, that is, to design and improve the monitoring and alarm system as a serious product, requires a person (preferably a core oncall person) to assume the role of a product manager to measure whether the product is from a human point of view. It is really easy to use, whether there are any visual problems, especially to avoid the broken window effect.

Technical Architecture Diagram Enter image description

Reliability Guarantee

Distributed systems improve the concurrent performance, but also increase the probability of local failures. A robust program design and deployment scheme can improve the fault tolerance of the system and improve the availability of the system. Reliability assurance is a series of measures and methods initiated by the operation and maintenance department to ensure business stability/reliability/robustness, including:

● Production Readiness Check

● Backup management system

● Failure analysis and summary

● chaos monkey

Mainly talking about chaos monkey, the general idea is that running water is not rotten, and the pivot is not beetle. By simulating various possible faults, the availability problems of the system are found, and the development/operation and maintenance personnel are reminded to make improvements at various levels.

expected

  • Most failures do not require immediate human intervention

  • The window for business exceptions (such as HTTP 502/503) is within two minutes

  • The alarm system should ensure

         不漏报
    
         没有报警风暴
    
         报警分级别(邮件/微信/电话)发到该接收报警的人
    

test case

The cases we need to test are:

  • service upgrade

  • Random destruction of business containers

  • Host severance

  • Network Jitter Simulation

  • Rancher basic service upgrade

  • Host-level network failure

  • Single host machine down

  • Several host machines are down

  • Availability Zone Down

Deployment Example (Single Tenant & Single Region)

Enter image description

Summarize

1. Smaller companies can also build relatively usable container platforms.

2. In the early stage of the company's development, some energy is invested in the construction of infrastructure, which is still valuable in the long run. This value is reflected in the early accumulation of a group of capable, experienced and motivated teams to continuously fight against scale Expanded complexity skyrockets the problem. A basic technical architecture that "makes people intuitive" and "looks" chaotic will greatly affect the coding efficiency of developers. It can even be speculated based on the principle of broken windows, developers may feel that projects that will run on "dirty", "chaotic", and "poor" platforms do not need to take quality too seriously. For a large organization, order is a valuable asset with immeasurable value.

3. The problem of slow image pulling can also be alleviated gracefully.

4. Generally speaking, it is inconvenient for domestic access to foreign network resources. Even if there is no GFW, the bandwidth is also a big problem. Our solution is also very simple, that is, caching and local access, which solves a "fly" problem in a more elegant and efficient way, improving the work experience of many people. As an engineer, I am very satisfied.

5. Containerization can also be seen as a reconstruction of the traditional operation and maintenance system.

6. Containerization is essentially a re-examination, design and reconstruction of the existing development operation and maintenance solutions after the container has become the so-called building blocks of the technical architecture. Microservices and cloud native gave birth to container technology, and the latter, especially the wonderful UX of the Docker tool itself, greatly inspired the enthusiasm of technical personnel and enterprises to run to the "promised land" of operation and maintenance. Although everyone knows that silver bullets do not exist, the Kubernetes ecosystem increasingly looks promising and gives people infinite hope. And the sale of hope itself has been proven by history, and it is really a business model that makes no loss.

Thanks

1. Thanks to the participants and contributors of the free software movement represented by Richard Stallman, so that small people and small companies can also make great achievements.

2. Thanks to Google Search for making it so convenient to search for information.

3. Thanks to the Docker company and the contributors of Docker software, it has spawned a huge industry and improved the lives of many developers/operations and maintenance personnel.

4. Thanks to Rancher, an excellent open source project that provides a Docker-like container operation and maintenance UX.

5. Thank you GitHub for making software collaboration and code sharing so convenient and pervasive.

6. Thanks to the authors of the mermaid plugin, you can easily define and edit beautiful flowcharts with markdown.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325023641&siteId=291194637