Agile and Efficient Development | A New Paradigm for Cloud Native Application Development and O&M

img


On May 18, Tencent Cloud held the Techo Day Tencent Technology Open Day. With the column "Unboxing! Tencent Cloud", it released and upgraded a series of cloud-native products and tools developed by Tencent itself. Among them, Liu Yi, general manager of Tencent Cloud Developer Product Center , shared a keynote speech on "New Paradigm of Cloud Native Application Development and Operation and Maintenance" around the topic of " Agility and Efficiency in Development ". This speech will share with you how Tencent Cloud supports the smooth collaboration of multi-functional teams through cloud-based development, operation and maintenance collaboration capabilities, helps enterprises accelerate digital agile transformation, improves the operation and maintenance efficiency of cloud-native architecture, and benefits from cloud-native.

img

Liu Yi ——CEO of Tencent Cloud CODING, General Manager of Tencent Cloud Developer Products. Mainly responsible for Tencent Cloud developer ecology and developer tools and platform product management, leading the team to output and empower Tencent's internal project collaboration and R&D efficiency improvement process, large-scale application of tools and platforms and related excellent practices to all walks of life Industry partners to help complete digital transformation and upgrade. He joined Tencent in 2011 and has created social product QQ Zone and office collaboration product Tencent Docs.

New trends in the field of cloud native development and operation and maintenance

Today, in the VUCA (Volatility, Uncertainty, Complexity, Ambiguity) environment, every enterprise is discussing how to improve its core competitiveness, which has been a topic that has attracted much attention in recent years. In the process of finding the answer to this question, Tencent Cloud found that ** deepening the core capabilities of R&D collaboration and integration of R&D and operation, and creating a new paradigm of efficient and rapid development and operation and maintenance can provide continuous support for enterprises in the process of digital and cloud-native transformation. Empowerment.

**
Through years of observation and thinking in the field of cloud-native development and operation and maintenance, Tencent Cloud has come up with three key points, namely:

  • Develop the cloud-native layer, showing the trend of "resource service ";
  • At the level of operational observation, the ability to " integrate data and observation " is required ;
  • The combination of application observation and collaborative troubleshooting " further enhances the collaboration ability ".

The development of cloud native shows the trend of "resource service"

With the gradual development of cloud native technology into large-scale practice, the industry has a clearer understanding of the future of cloud native. In addition to the necessary elements of the first generation of cloud-native DevOps, containers, and microservices, it further penetrates into the optimal solution for resource allocation and application management efficiency improvement.

Tencent Cloud has a complete layout for cloud native, covering infrastructure, security, computing, architecture, data and other aspects, among which the development of cloud native is an important part of Tencent Cloud's native layout.

img

First of all, future applications will be "born in the cloud and grow in the cloud", and the development of cloud native will also show the characteristics of "resource service" . This means that resource management and scheduling will become more efficient in the future. Developers can be freed from local coding, offline delivery, and inefficient resource control, and complete coding debugging and application deployment in the cloud, maximizing the use of cloud native Tech bonus.

The observable level of business unification requires the ability to "integrate data and observations"

Secondly, with the popularization of cloud native, business complexity has gradually increased. In the traditional monitoring mode, the data is scattered and unconnected, and the monitoring of different business layers is mostly completed through different products and tools. Therefore, when business anomalies are detected, the efficiency of drill-down and linkage analysis between data is low.

By taking business as the core, a variety of data sources are collected on a unified platform, covering all data types of indicators, links, logs, and events, and a unified data collection, processing, and observation platform is built, combined with integrated fault prediction and fault alarms , Fault location tools, building such a full-link, end-to-end data and observation integrated platform can help greatly improve the efficiency of operation and maintenance, from passive monitoring to active observation.

img

"Integration of supervision and control" continues to evolve

System reliability and stability are the cornerstone of enterprise competitiveness. Once a failure occurs, multiple functional roles need to be quickly involved, and multiple parties coordinate to locate the problem, restore the application, and solve the problem as soon as possible.

During this process, troubleshooting personnel will feel the separation between observation tools and engineering information, the asynchronous context alignment, and the difficulty of remote collaboration at the same frequency. There is still a lot of room for improvement in troubleshooting efficiency.

By connecting code data, engineering data, and observation data, providing fault information alignment capabilities, enhancing multi-person online collaborative troubleshooting scenarios, further improving operation and maintenance collaboration capabilities, and evolving DevOps closed loops, can it be implemented to help the business side respond to troubleshooting in a timely and efficient manner. faults to ensure system availability.

img

Severe challenges faced by customers

For many years since its establishment, Tencent Cloud has continued to cultivate the field of cloud computing and served millions of developers with excellent technical capabilities. Some of the pain points further reflect the above-mentioned "trend" point of view:

  • The efficiency from development, debugging to deployment is low , including that the development environment is difficult to unify and configure repeatedly, local resource isolation is weak and unstable, continuous construction and deployment are complicated due to environmental management and control, and the efficiency needs to be improved.

  • The location of data dispersion is inefficient , the cloud-native architecture is complex, business indicators, links, logs and other data are scattered, and there are isolated islands at the front and back ends, and it is impossible to uniformly observe the business architecture. When an exception occurs, multi-system and multi-data scheduling is required to Supports troubleshooting and affects O&M efficiency.

  • Asynchronous multi-person troubleshooting information is difficult to align , and effective diagnostic information such as multi-availability zones, multi-period alarm information, monitoring logs, operations, and feedback during failures is scattered in various time periods and in the hands of various troubleshooting personnel. Troubleshooters cannot quickly share and align troubleshooting context with each other. It is also difficult to trace back the process information of fault handling during post-mortem review.

  • The efficiency of remote collaboration is low , and there are differences in resource authority, business knowledge, tools, and technical proficiency in remote multi-functional collaborative troubleshooting. Each role only masters part of the information or tools in the link. Therefore, when troubleshooting, the information between different roles cannot be conveniently shared and agreed, resulting in a decrease in troubleshooting efficiency.

Tencent Cloud Viewpoint

Viewpoint 1: "Resource Service"

In response to the above pain points, Tencent Cloud's first consideration is to realize "resource service" in the process of development, debugging and continuous delivery, so as to provide solutions to the challenges of R&D resources.

So we have the prototype of the concept of cloud development + environment hosting for cloud native development , providing a service-based cloud development environment (Cloud Development Environment), enabling development, compilation and debugging through the cloud, solving traditional development resource management problems, and further promoting Develop cloud-native implementation.

img

In the resource service mechanism, developers can develop their own modules without interfering with each other. When necessary, they can implement mutual calls and even breakpoint joint debugging.

In the microservice scenario, this process can prompt developers to shift to the left and joint debugging. Each microservice can quickly start the corresponding cloud development environment, cloud construction, cloud deployment, and quickly preview the development effect through the traffic scheduling scheme. The development cluster also provides measures such as automatic hibernation for cost control.

Viewpoint 2: "Integration of data and observations"

In view of several problems in the traditional monitoring system, we recommend the construction and use of **"data and observation integration"** observable platform, and provide practice on the cloud.

The integrated observable platform connects multi-source and multi-type monitoring data in a unified manner, processes data based on powerful DSL, real-time/associated analysis and other capabilities, and finally provides integrated display, multi-dimensional analysis, Early warning notification and AIOPS capability.

In this way, problems such as difficulty in monitoring scale expansion, standardized management, and slow correlation analysis and troubleshooting caused by scattered monitoring and alarm data and lack of a global perspective can be solved.

img

Viewpoint 3: "Integration of supervision and control"

Combining observability capabilities with DevOps, we believe that "application management" can be deeply combined with "application observability" to establish a unified observation platform centered on applications and from a business perspective. **

In the downstream link of DevOps, it provides core capabilities covering daily problem discovery/location/solution for applications, accesses application observable capabilities such as monitoring alarms, link tracking, and log tracking, and breaks down the information between various APM tools from the application perspective Barriers, to establish associations between the original scattered information, smooth out tool differences between different environments, and establish an integrated observation capability centered on applications and from the perspective of service R&D.

img

At the same time, based on the integrated observability capability, various observation data standards are unified, and the pluggability and scalability of observable tools are realized, and users can also customize extensions. On top of this , innovatively combined with the real-time consensus attributes of Tencent conferences, upgraded operation and maintenance and troubleshooting coordination methods, and closed-loop the in-depth evolution of DevOps.

A one-stop collaboration platform for development, operation and maintenance on the cloud

As a leading cloud platform in China, Tencent Cloud has always insisted on being customer-oriented, constantly innovating and polishing products and service experiences that are close to users' thinking. Now officially launched a one-stop cloud development, operation and maintenance collaboration platform to the outside world , supporting multi-functional teams to collaborate smoothly and conveniently on the same platform, "efficient and fast, creating a new paradigm for the development and operation of a new generation of cloud-native applications" .

The product advantages of the one-stop cloud development, operation and maintenance collaboration platform can be summarized as the following three points:

  • Development resource hosting : online cluster debugging, one-click pull from warehouse and load cloud development environment, dynamic resource allocation, convenient and flexible joint debugging.

  • Application observation : multi-protocol monitoring, full product coverage, situational alarm, non-intrusive business data collection and reporting, and full data dimension display.

  • Remote collaborative troubleshooting : one-click launch of the same-screen meeting of stakeholders, smoothing out information fragmentation and organizational asynchrony, focusing on online collaborative troubleshooting, positioning, and repairing online.

img

This paradigm aims to cover the entire life cycle from application development to application operation and maintenance on the cloud. To put it simply, users can use the cloud development environment Cloud Studio for multi-person coding collaboration, online debugging and service deployment; they can also push the code to the one-stop R&D efficiency management platform CODING DevOps to complete a series of continuous delivery tasks.

After the application is released, with the application as the core, access to observable capabilities provides services with fault prediction, alarms, and positioning support for the entire link of the application environment. When a fault occurs, collaborative troubleshooting and problem solving are initiated. The last in closed-loop DevOps one kilometer.

New Product Tool 1: Tencent Cloud Observable Platform

In order to solve the problems of traditional monitoring data islands, complex management, and inability to link and troubleshoot data, we have created the Tencent Cloud Observable Platform to provide users with an integrated monitoring solution .

img

At the data source level, it supports all types of monitoring data of indicators, links, logs, and events, and supports linkage analysis between data. At the same time, Tencent Cloud Observable Platform supports flexible and rich alarm scene configuration, and provides a detailed alarm dashboard to provide comprehensive insight into business alarms . In addition to basic management and analysis capabilities, we have also planned advanced capabilities such as intelligent anomaly detection, root cause analysis, and automated operation and maintenance. Through the monitoring of cloud products, front-end performance, application performance, linkage testing, pressure testing, visualization and other observation capabilities, it can cover unified monitoring and inspection, one-stop troubleshooting, front-end and back-end linkage troubleshooting, user experience escort, etc. Various monitoring scenarios.

The Tencent Cloud Observable Platform supports full-link tracing in multiple mainstream languages ​​and protocols, which improves the efficiency of front-end and back-end data connection by more than 90% , helping development and maintenance to quickly realize front-end and back-end data link analysis. We have realized non-intrusive data collection in some scenarios, and users can report data without making any changes to the business. Subsequent support for eBPF will also be launched quickly, providing users with a more powerful non-intrusive collection method.

In addition, the Tencent Cloud Observable Platform upgrades alarms in an all-round way, which can provide customers with integrated troubleshooting capabilities : from data sources to alarm configuration and alarm notification processing, multi-dimensional alarm configuration for indicators and events of various data sources, Alarm conditions also support dynamic thresholds and composite alarms based on machine learning. For important alarm events, it can be upgraded to fault management with one click to manage the whole life cycle, including context tracing of the fault process, process management and fault recovery.

img

New Product Capability 2: Remote Collaborative Troubleshooting

Innovatively improving troubleshooting collaboration capabilities, Tencent Cloud further launched a remote collaborative troubleshooting solution that perfectly combines DevOps and Meeting scenarios.

The remote collaborative troubleshooting solution is based on the industry-leading remote real-time collaborative capabilities of Tencent Conference, combined with CODING developer services and cloud monitoring and observable platforms, to create an industry-leading remote collaborative troubleshooting solution around "application operation and maintenance". The core capabilities of multi-person collaborative troubleshooting scenarios strengthen the concept of observability-driven development, break through the barriers between observability and code engineering, and improve the efficiency of remote troubleshooting.

img

When an alarm is received and troubleshooting is initiated, directly open CODING Orbit's application observation tool from the ChatOps notification or work order system. The observation tool is seamlessly connected to Tencent Cloud's APM observation product, and aggregates the observation information centered on the application. From an application-centric perspective, users can seamlessly switch to view fault call chains, logs, and monitoring indicators on the same workbench. Based on the comprehensive observation information provided, they can quickly locate fault points and improve troubleshooting efficiency.

In a multi-person troubleshooting scenario, the workbench supports one-click video conferencing, and multiple people on the same screen can simultaneously locate the key node that caused the fault in the error stack , quickly assign tasks, formulate plans, and even repair defects. When it is determined to be a code defect, you can quickly locate the relevant code warehouse file, start the IDE, perform code repair, online debugging, and release again. The collaboration of multi-person conferences on the same screen speeds up the collaboration speed of the entire process from task assignment, function supervision, plan coordination to final problem repair.

Customer solutions are mature and implemented, and multi-industry practices are fully blooming

At present, Tencent Cloud Developer Services has more than 3 million developers and tens of thousands of enterprises, and has fully blossomed in practice in multiple industries.

img

For example, in the financial industry, Futu Securities has high requirements for service stability and complex architecture, involving hybrid clouds and multiple regions. Real customer access experience and page anomaly monitoring are also important concerns in the observability construction of financial customers . Through the observable platform of Tencent Cloud, it helps customers quickly test overseas network conditions, understand the real experience of users, help customers locate problems from multiple dimensions such as platforms and ISPs, and supports cross-regional disaster recovery of monitoring data to build a unified business for customers Monitoring and visualization platform .

In the retail industry, in order to fully empower the development of Yili Group's main business and respond to the digital transformation strategy, CODING realizes the construction of an end-to-end DevOps platform by providing major functional modules such as project collaboration, CI/CD, product warehouse, R&D measurement, and application observation . Bridging the DevOps gap between source code and usable programs . At present, the 20+ small program projects of Yili Group's business department have completed the application of the agile R&D process, and the iteration efficiency of small program requirements has increased by more than 30%, and the system has achieved stable operation.

end

Digital technology is bringing huge changes to all walks of life. It is Tencent Cloud's long-standing pursuit to provide users with more comprehensive, stable, and secure cloud-native services in the cloud-native field. Tencent Cloud will continue to increase the scale of product R&D investment and technological innovation, provide millions of developers with a more complete product matrix, help developers simplify complexity, and improve R&D and operation and maintenance efficiency.

img

Guess you like

Origin blog.csdn.net/CODING_devops/article/details/130887515