What exactly is cloud native DevOps?

Introduction: What exactly is cloud native DevOps? We believe that cloud-native DevOps is to make full use of cloud-native infrastructure, based on microservices/no-service architecture systems and open source standards, independent of language and framework, with continuous delivery and intelligent self-operation and maintenance capabilities, so as to achieve higher than traditional DevOps Service quality and lower development and operation costs allow R&D to focus on rapid business iteration.

>>Press conference portal : https://yqh.aliyun.com/live/detail/21798

Click to view details : https://yqh.aliyun.com/live/yunxiao

1. What is cloud native DevOps

Let's first use a simple example to understand what cloud native DevOps is and how it is different from DevOps.
image.png
The picture above is a food stall. The chef in the picture is working very hard to cut, fry, make various kinds of food, and sell it. From the purchase of raw materials to processing to sales to after-sales, one or two people complete it. This is a very typical DevOps scenario where the team handles everything from end to end. In this case, when the chef has a relatively high level and strong sales ability, high efficiency and low waste can be achieved. But the problem is that it will be difficult to scale. Because its processes are non-standard, it requires the chef to have strong personal ability.
image.png
Let's look at this picture of Nanjing food stalls. Although there are food stalls in the name, it is obviously not the food stalls we mentioned above. When we walk into any Nanjing food stall, we can find that the chefs of Nanjing food stall can focus on providing customers with better dishes, develop and test new dishes, and try and promote them through small batches of users. Whether the number of users increases or decreases, they can quickly adapt. Shop expansion can also be fast. We can understand this as cloud-native DevOps.

So what exactly is cloud native DevOps? We believe that cloud-native DevOps is to make full use of cloud-native infrastructure, based on microservices/no-service architecture systems and open source standards, independent of language and framework, with continuous delivery and intelligent self-operation and maintenance capabilities, so as to achieve higher than traditional DevOps Service quality and lower development and operation costs allow R&D to focus on rapid business iteration .
image.png
As shown in the figure above, cloud native DevOps is based on two principles: compliance with open standards, language and framework have nothing to do, there are two foundations: microservice/no-service architecture, serverless infrastructure BaaS/FaaS, providing two capabilities: intelligent self-operation and maintenance , Continuous delivery.

Compliance with open standards, language and framework have nothing to do with, compared to a specific language, specific framework, it can have higher flexibility, better development and vitality when technology upgrades or iterations, and form a better ecology.
Two foundations: Based on microservices and no-service architecture, DevOps can be made possible; Serverless-based infrastructure is resource-oriented and demand-oriented to achieve better flexibility.
Based on these two principles and two foundations, two capabilities are achieved: continuous delivery and intelligent self-operation and maintenance.

2. Alibaba cloud native DevOps upgrade case

Let's first look at a case of a cloud-native DevOps transformation of an Alibaba team.

Case background: An overseas e-commerce team of Ali faces many challenges in overseas markets, such as many sites, high site construction costs, fast demand changes, slow delivery, and high operation and maintenance costs. How to smoothly upgrade to cloud-native DevOps to solve these problems and improve business delivery efficiency What? This is what we do.
(1) Architecture upgrade-service governance sidecar and mesh The
image.png
first step is to upgrade the architecture. First, the service governance code is sinked to the sidecar part outside the application, and the service grid is used to carry capabilities such as environmental routing . As shown in the figure above, each green dot represents a service application code, and each orange dot represents a service management code. These codes are stored in this container in the form of a two-party package. With the construction of the service governance system, it contains a lot of things, such as log collection, monitoring burying points, operation and maintenance intervention, etc. We call this kind of container rich container. The problem is obvious: even if it is an upgrade or adjustment of log collection, we need to upgrade, build, and deploy the application again. However, this has nothing to do with the application itself. At the same time, because the concerns are not separated, a bug in log collection will affect the application itself.
image.png
In order to allow the application to focus more on the application itself, the first thing we did was to separate all the service governance code from the application container and put it in the sidecar, so that there are two service governance and application codes. In the container. At the same time, we handed over some of the original service management tasks, such as test routing and link tracking, to the Mesh sidecar. In this way, the application is slim, and the application only needs to care about the application code itself.

The advantage of this is that the business can focus on business-related application code without relying on service governance.

This is the first step, and this step is smooth, because we can gradually migrate the service governance to the sidecar without worrying about the excessive cost of a migration.
(2) Architecture upgrade-from construction decoupling, release decoupling to operation and maintenance decoupling In the
second step, we have done three levels of decoupling: construction decoupling, release decoupling, and operation and maintenance decoupling.

Those who understand microservices and serviceless architectures should know that only when a business can be independently developed, tested, released, and operated, can the business run faster and better. Because this minimizes the coupling with other people.

But we also know that as services become more and more complex and applications continue to evolve, applications will contain more and more business codes. For example, in the application in the figure below, some codes in it are for a specific business. For example, as a payment application, some are for the specific needs of Hema, some are for the specific needs of Tmall, and some are general codes. , Or platform code, is for all business scenarios.
image.png
Obviously, from the perspective of improving development efficiency, business parties can change their related business codes to reduce communication costs and improve R&D efficiency. But this brings about a new problem: if a certain business needs changes, but does not involve general business logic, it is also necessary to fully return to all businesses of the entire application. If there are other business changes during this time period, They need to integrate and publish together. If there are many changes to the business, everyone needs to queue up for integration. In this case, the cost of integration testing and communication and coordination is very high.

Our goal is that each business can be independently developed, released, and operated. In order to achieve this goal smoothly, the first thing we need to do is to make them decoupled in the construction phase. For example, for a relatively independent business, we build it separately as a container image, and place it in the init Container of the Pod through orchestration. When the Pod is started, it is then mounted to the storage space of the main application container.

But at this time, application release and operation and maintenance are still together, we need to separate them.

We know that the intimacy of applications can be roughly divided into three categories:
1. Ultra-intimacy, which communicates through function calls in the same process.
2. Different containers located in the same Pod communicate through IPC.
3. In the same network. Through RPC communication,
we can gradually split some business codes into RPC or IPC services according to the characteristics of the business, so that they can be released and operated independently.
So far we have completed the construction decoupling, release decoupling and operation and maintenance decoupling of the application container.

(3) The
image.png
third step of IaC & GitOps, let's take a look at the development and operation and maintenance status . In many R&D scenarios, a thorny problem is: different environments and businesses will have a lot of their own unique configurations. During release and operation and maintenance, it is often necessary to modify and select the correct configuration according to the situation, and this configuration and application code It is actually part of the release itself, and the cost of traditional maintenance through the console will be very high.

In the context of cloud native, we think IaC (Infrastructure as Code) and GitOps are better choices. In addition to a code base for each application, we also have an IaC repository. This repository will contain the image version of the application and all related configuration information. When code changes need to be released or configuration changes, they are all pushed to the IaC warehouse in the form of code push. The GitOps engine can automatically detect IaC changes, and automatically translate them into configurations that conform to OAM specifications, and then apply the changes to the corresponding environment based on the OAM model. Whether it is development or operation and maintenance, you can learn what changes have occurred in the system through the I aC code version, and each release is complete.

(4) BaaSization of resources The
image.png
last step is the BaaSization of resources.
Let's imagine how to use resources in the application. We usually first go to the corresponding console to submit a resource application, describe the resource specifications and requirements we need, and then get the connection string and authentication information of the resource after passing the approval. Add the resource configuration to the configuration of the application. If there is any change afterwards, go to the corresponding console to operate it, and cooperate with the code release for approval. Of course, the operation, maintenance and monitoring of such resources are generally carried out in an independent console.
When we have more and more types of resources, the operation and maintenance costs are very high, especially when building a new site.
Based on the principle of describing resources in a declarative way and using them on demand, we simplify the use of resources by all applications by defining these resources in IaC. All resources are described in a declarative manner, enabling intelligent management and on-demand use of resources. At the same time, all of our resources use common resources and standard protocols on the cloud, which greatly reduces migration costs. In this way, we gradually migrate the business team to the cloud native infrastructure.
Therefore, the two key points of resource BaaSization are:

  • Declaratively describe resource requirements, intelligent management, and use on demand

  • Use common resources on the cloud to align standard protocols

3. Cloud efficiency drives the efficient implementation of cloud-native DevOps

What we shared above is Ali’s internal practice, which relies on Ali’s internal R&D collaboration platform Aone. Aone's public cloud version is Aliyun cloud effect. How do we implement cloud-native DevOps through Alibaba Cloud cloud effects?
image.png
From the previous cases, we can see that the implementation of cloud-native DevOps is a systematic project, including methods, architecture, collaboration, and engineering. Among them, the implementation of cloud-native DevOps belongs to the category of lean delivery.
image.png
The above picture is a cloud-effect cloud-native DevOps solution diagram.
Here, we divide users into 2 roles:

  • Technical lead or architect

  • Engineers, including development, testing, operation and maintenance, etc.

As a technical director or architect, he needs to define and control the R&D behavior of the enterprise as a whole. From a broad perspective, the R&D process includes four aspects: operable, observable, manageable, and changeable.

First of all, he will define the company's R&D collaboration model, such as whether to adopt agile R&D or lean Kanban. Secondly, he needs to master the overall product architecture, such as which cloud products need to be used, and how these cloud products are coordinated and managed. Then he will decide the team's R&D model: how to do R&D collaboration, how to control R&D quality, etc. In the third step, he needs to determine the release strategy, whether to use grayscale release or blue-green deployment, what is the grayscale strategy, and so on. Finally, it is the monitoring strategy of the service, such as which monitoring platforms the service needs to access, how to detect the service status, global monitoring configuration and so on.
Front-line development, testing, and operation and maintenance engineers focus on smooth and efficient work processes. After the cloud effect project collaboration platform receives a requirement or task, it can be coded, submitted, built, integrated, released, and tested through the cloud effect, and deployed to the pre-release and production environment, and the R&D mode and release configured by the administrator The strategy really landed. At the same time, each environment is automatically triggered and flowed, without human coordination and pull.

The data generated during the entire R&D process is an organic whole, which can generate a large amount of data insights and drive the team to make continuous improvement. When the team encounters bottlenecks or confusion in the R&D process, they can also get professional diagnosis advice and R&D guidance from the cloud efficiency expert team.

To sum up, the cloud-native DevOps solution of Cloud Efficiency is guided by the ALPD methodology, based on the best practices recommended by experts, and deeply integrated into the complete DevOps tool chain to help enterprises gradually move into cloud-native DevOps.

Next, we look at a specific case.

An Internet company has a R&D team of about 30 people and no full-time operation and maintenance personnel. Its products include more than 20 microservices and dozens of front-end applications (web, applets, apps, etc.). Its business is growing very fast. In the face of rapidly growing customers and increasing demands, the original script-based deployment method based on Jenkins+ECS has gradually failed to meet the demands, especially the problem of zero-downtime deployment and upgrade. . As a result, I began to need the help of cloud efficiency, and finally fully migrated to cloud efficiency cloud native DevOps.

This R&D team faces three major pain points:

  • Large number of customers and many urgent needs

  • No full-time operation and maintenance, cloud native technology such as K8S has a high learning threshold

  • Complex IT infrastructure, time-consuming and labor-intensive release

In response to these problems, cloud efficiency starts from three aspects: basic capabilities, release capabilities, and operation and maintenance capabilities.

First, introduce Alibaba Cloud ACK to upgrade the infrastructure on top of the existing ECS ​​resources, and transform the application into containerization. In terms of service governance and application architecture, the Spring Cloud family bucket is simplified to SpringBoot, and service discovery and governance are supported through K8S standard capabilities.

Secondly, automatic container deployment is realized through the cloud-effect pipeline, and the gray-scale deployment strategy can be used to achieve gray-scale online, automatic expansion, and automatic restart in the event of a failure. At the same time, based on the cloud-efficiency pipeline, it can achieve zero downtime and quickly roll back any cost, saving machine costs At the same time, the problem of no full-time operation and maintenance personnel in the enterprise is solved.

Third, through the cloud-efficiency automated assembly line and branch protection standard research and development model, including code review, code inspection, test card points, etc., to improve feedback efficiency and release quality.

The figure below is the architecture diagram of the overall solution.
image.png

4. Cloud native DevOps upgrade path

We divide the implementation of cloud native DevOps into 5 stages.
image.png

The first stage: All manual delivery and operation and maintenance. It is our initial stage. The application architecture has not yet undergone service transformation, nor has it used cloud infrastructure or only IaaS. There is no continuous integration, test automation, manual deployment, release, and manual operation and maintenance. I believe that few companies have stayed at this stage.

The second stage: Tool-based delivery and operation and maintenance. The first thing to do is to serve the application architecture and use the microservice architecture to improve service quality; secondly, to introduce some research and development tools, such as gitlab, jenkins and other island-style tools to solve some problems. At the same time, we began to implement continuous integration of single modules, but generally there is no automated quality stuck point, and the release is often assisted by automated tools.

The third stage: limited continuous delivery and automated operation and maintenance. We further improved our basic capabilities and transformed our infrastructure into containers based on CaaS. On the other hand, it began to introduce a complete tool chain to open up research and development data, such as using a tool platform such as cloud-effect DevOps to realize the complete intercommunication of all data. Continuous deployment can be achieved in terms of release capabilities, but a certain amount of manual intervention is required. At this time, automated testing has become mainstream, the service as a whole can be observed, and the operation and maintenance can be service-oriented and declarative.

The fourth stage: continuous delivery and manual-assisted self-operation and maintenance. We further let our development students focus on business development. First, we began to adopt a large number of serviceless architectures in the application architecture, and achieved unattended continuous deployment; the grayscale and rollback of releases can be automated as much as possible with intervention . The observing ability is upgraded from the application level to the business level, realizing the observability of the business, and being able to do part of the self-operation and maintenance with manual assistance.

The fifth stage: continuous delivery across the link and self-operation and maintenance. This is the ultimate goal we seek. At this stage, all our applications and infrastructure adopt a serviceless architecture, and achieve end-to-end unattended continuous delivery, including release rollback and grayscale are also automated; technical facilities and services are fully self-operated and maintained . Developers really only need to care about business development and iteration.

However, the devil is in the details. Of course, there are still many problems that we need to solve when we really land. With the help of tool platforms such as Cloud Effect and the expert consultation of ALPD, we can avoid detours and achieve our goals faster. .

Original link: https://developer.aliyun.com/article/781257?

Copyright statement: The content of this article is voluntarily contributed by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find that there is suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Guess you like

Origin blog.csdn.net/alitech2017/article/details/112847160