Leveraging Alibaba Cloud's implementation of cloud-native architecture transformation, the operation and maintenance cost is reduced, and the efficiency and stability are both improved.

Author: Dangbei technical team

With the rapid development of business, Dangbei's traditional IT assets are gradually becoming bloated. In order to avoid the bottleneck that restricts development, the technical team made decisive changes: after the core business is cloud-native, the operation and maintenance efficiency, overall stability and R&D efficiency have all been improved. a comprehensive improvement. This article mainly briefly describes the background appeal, implementation method and harvest results of the cloud-native road of Dangbei technical team.

foreword

Dangbei was established in August 2013. It is one of the well-known smart large-screen value-added service providers in China and the president unit of the China Large-Screen Application Software Branch. It is a large-screen Internet platform that spans the entire ecology of software, hardware and operating systems. A large-scale company, committed to becoming the core entrance and life entertainment center of AIoT for hundreds of millions of families, it has been selected into the list of future unicorns for many years, and it is a national-level professional and new "little giant" enterprise.

insert image description here

Dangbei cloud's native architecture practice history

Three pain points of traditional operation and maintenance system

With the rapid development of Dangbei's business scale, the IT technology behind it is also constantly updated and iterated, and the scale of IT assets is also rising rapidly, inevitably facing some challenges. Among them, the challenge of the operation and maintenance system is the most obvious. According to the team's summary, there are the following three prominent pain points.

The efficiency of manual operation and maintenance is low, the risk is high, the cost is high, and asset management is difficult

Under the traditional operation and maintenance system, there is a lot of manual participation. From the release of various environment codes, to the expansion and contraction of peaks and valleys, and to the management of cloud assets such as various certificates and cloud servers, the higher the degree of manual participation in these links, the greater the risk. It is difficult to guarantee that there will be no mistakes or omissions in the long run.

At the same time, the higher the degree of manual participation, the lower the efficiency and the higher the cost of collaboration. In order to ensure stability, every online system change requires coordination of a large number of cross-departmental cooperation, often requiring students from multiple positions such as R&D, operation and maintenance, and testing to participate late at night.

With the development of Dangbei OS, Dangbei Music, Dangbei Market and other businesses, the scale of IT has also expanded rapidly, and cloud asset management has also become a more prominent pain point.

Stability challenges are great, and abnormal troubleshooting and recovery costs are too high

Dangbei has extremely high requirements for system stability and business continuity. With the rapid increase of traffic, especially in some situations such as the Spring Festival Gala, the traffic often surges by ten times or even dozens of times, which puts great pressure on stability and capacity planning.

At the same time, when an abnormality occurs in the production environment, under the traditional operation and maintenance system, there are core pain points such as complex dependency links, difficult troubleshooting, long positioning time, and many people involved.

In this regard, the entire server department set two requirements: 1-5-10 fast recovery and 99.95% availability , to accurately understand the core of the problem and guide the solution.

With the rapid development of Dangbei's various businesses, the implementation of these two requirements is an imminent and must-win battle for the entire server team.

The self-built observable system is complicated to implement, poor in usability and stability, and high in operation and maintenance costs

For any large-scale IT system, the observable system is an extremely important underlying cornerstone. It enables the overall design of the IT architecture, such as dependency topology, call link tracking, technical standards, operating status, stability, and many other information to be clearly presented. In addition to positioning In addition to troubleshooting, it is also helpful to discover historical architectural design defects and system bottlenecks in advance and solve them in a timely manner. While ensuring business continuity, it can efficiently support business development and iteration.

In the early stage, in order to ensure the rapid launch of various systems and high-speed iteration of business, there were some situations where the technical architecture was not well thought out and the design was insufficient. The specific manifestations were different types of selection, high business coupling, long call links, and cloud resource selection. Unreasonable, unclear management, etc. These factors are combined to form a huge historical burden. Under the traditional operation and maintenance system in the past, some observable components or frameworks were built by ourselves, but they faced poor stability, high operation and maintenance costs, difficulty, poor usability, and inconsistent systems. And other aspects of the problem, so that it can not fully play its due value.

Today, in the context of the continuous and accelerated growth of Dangbei's business scale, it is urgent to implement a comprehensive, easy-to-use, safe, stable, and cost-effective observability system to support the company's steady and long-term development.

Construction of Cloud Native Architecture

In the face of the three prominent core pain points of the traditional operation and maintenance system, in order to avoid constraints on Dangbei's sustainable development strategy in the future, Dangbei's technical team conducted extensive research, in-depth analysis, and active research, and finally set its sights on Based on the cloud native architecture.

As Alibaba Cloud said in the "White Paper on Cloud Native Architecture": The next stop of cloud computing is cloud native; the next stop of IT architecture is cloud native architecture.

Dangbei's technical team strongly agrees with this point of view. Cloud native is a definite technological development trend. More and more companies embrace cloud native and use cloud native to achieve more efficient development and innovation.

After a full assessment from a global perspective, Dangbei's technical team, under the leadership of R&D director Zhang Zixiao, proposed four major technical strategic goals of cloud-native, middle-platform, micro-service, and digital, and decided to fully transform the cloud-native architecture.

Only by using the cloud-native architecture to completely solve the pain points of high risk and low efficiency of the traditional operation and maintenance system, can we transform some of the old systems with long-standing malpractices and chronic illnesses into mid-platform and micro-service.

As for the choice of cloud vendors, considering that Alibaba Cloud is the evangelist and promoter of cloud computing in China, its strength is globally leading, and its contribution to the development of cloud native technology is obvious to all. At the same time, it has gathered the top talents and the richest experience in the industry. Cases, the most reliable maturity, and its "customer first" value, Dangbei's technical team finally chose to use Alibaba Cloud to implement the cloud-native architecture transformation.

insert image description here
insert image description here

Containerization to the cloud

In the field of cloud-native architecture infrastructure, Kubernetes is a well-deserved leader.

Compared with relying on virtual machine self-built clusters, the ACK service provided by Alibaba Cloud has the advantages of better elasticity, better resilience, free operation and maintenance, and more efficient resource management. At the same time, it seamlessly integrates a large number of Alibaba Cloud products .

insert image description here

Relying on ACK and a large number of integrated products, Dangbei's technical team quickly completed the containerized transformation of core services, and successfully completed grayscale releases and comprehensive stream cuts. It is worth mentioning that during the implementation of the new architecture, the Dangbei technical team will inevitably encounter intractable diseases, but because of the support of Alibaba Cloud's large number of experience cases and best practice guidance, including capacity planning, scalability Observation, security protection, stability and many other aspects make the entire process of migrating to the cloud always in a reliable state.

After migrating to the cloud, the efficiency of these core services has been greatly improved throughout the entire service life cycle from the development state to the test state, to the change and operation state.

Using cloud-native Devops, the efficiency of project release and collaboration is increased by 300%, completely avoiding the high risk of manual operation and maintenance intervention; using the natural decoupling characteristics of ACK service and server resources, completely getting rid of the inefficient trouble of infrastructure operation and maintenance; using HPA+CronHPA, calmly deal with traffic peaks and valleys...

Not only that, the overall resource utilization rate of these core services has increased by 20%, and the operation and maintenance efficiency has increased by more than 500% , making larger-scale IT resource management possible.

In the process of deeply participating in the transformation of the cloud, Dangbei's technical team has accumulated a lot of knowledge and experience, which has contributed to the company's technical reserves, and is still actively exploring cloud-native technologies.

Cloud Native Gateway

While introducing ACK as the cloud-native infrastructure, the Dangbei technical team also introduced the MSE cloud-native gateway as a traffic management component.

insert image description here

After the cloud-native gateway combines traffic gateway, microservice gateway, and security gateway into one, not only links are reduced and performance is improved, but the complexity of service governance is also greatly reduced, and the stability is greatly improved.

With the help of the high integration of the cloud-native gateway, the core services after migrating to the cloud can obtain service governance, security protection, monitoring and alarm capabilities without intrusion, compared with Dangbei's self-built gateway under the traditional operation and maintenance system in the past , the cloud-native gateway has the advantages of high availability, high performance, elastic scalability, and ease of use. It achieves a completely free operation and maintenance at the gateway level, reduces manual intervention, and greatly improves the overall stability of the IT system.

It is with the help of the combination of ACK+MSE cloud-native gateway that the Dangbei technical team has achieved the two goals of 1-5-10 and 99.95% with almost no operation and maintenance costs.

With Dangbei OS, Dangbei Music and other core services on the cloud, the stability, business continuity, and R&D efficiency have been greatly improved, and the user experience has also been greatly improved, laying a good technical foundation for the long-term development of Dangbei's business .

So far, Dangbei's technical team is still actively promoting the migration of the remaining business systems to the cloud, sparing no effort to complete the goal of fully transforming the cloud-native architecture, and fully exploiting the value of the cloud.

observable system

Establishing a comprehensive, easy-to-use, secure and stable observable system is also an important means to achieve the goals of 1-5-10 and 99.95%, and it is also a key support for achieving the centralization and micro-service.

In the evaluation process before determining the comprehensive transformation of the cloud-native architecture, the Dangbei technical team has conducted in-depth research on Alibaba Cloud's observable system solutions.

insert image description here

After comparing the observable components built by the operation and maintenance team in the past, such as log service and link tracking, it was found that they had many problems such as poor usability, poor stability, high operation and maintenance costs, and old versions. , MSE cloud-native gateway and other cloud-native components support there is an adaptation cost.

The original intention of observable components is to improve stability, ensure business continuity, and present information such as link topology, so as to ultimately improve R&D efficiency, so that everyone can know what they are doing.

If a lot of operation and maintenance costs and R&D costs are invested in the component itself, it will end up putting the cart before the horse and going the opposite direction.

Therefore, the Dangbei team finally decided to adopt Alibaba Cloud's observable solutions, mainly based on ARMS, SLS, Grafana, Prometheus, cloud monitoring and other products, relying on their ability to be highly integrated with cloud-native components such as ACK and MSE, to build Dangbei The observable system under Yunyuan.

insert image description here

construction results

Facing prominent problems such as low efficiency, high risk, high cost, and poor stability of the traditional operation and maintenance system, in order to prevent it from becoming a long-term constraint on Dangbei's business development, Dangbei's technical team decisively adopted a comprehensive cloud-based architecture transformation, After the core services are migrated to the cloud, the pain points caused by the traditional operation and maintenance system have been greatly solved. The overall operation and maintenance cost has been reduced by 80%, the efficiency has been improved by 500%, and the R&D efficiency and stability have been greatly improved.

The most critical achievement is the removal of the two major constraints of operation and maintenance efficiency and risk. Based on this, Dangbei has quickly promoted the implementation of China-Taiwanization and micro-services, which has been basically completed so far.

future outlook

Dangbei's transformation and exploration of cloud-native technology has not only released internal productivity, but also greatly improved user experience, laying a solid technical foundation for the company's long-term sustainable development strategy.

But this is only the starting point for Dangbei's technical team on the cloud-native road. With the development of business and the deepening of micro-services, more challenges will be faced in the future. Dangbei hopes to open roads in the mountains and build bridges in the water, so as to enrich the living room life of more families and bring fun to more families.

Click here to enter the cloud native community for related information

Guess you like

Origin blog.csdn.net/alisystemsoftware/article/details/130327312