Seal Liang Sheng: Platform engineering not only provides tools for engineers, but also provides guardrails for AI

Original technology cloud report.

With the rise of the concepts of DevOps and cloud native, it seems that all of a sudden, engineers have to master dozens of different tools, Helm charts, Terraform modules, etc. just to deploy and deploy in multiple environments of multi-cluster microservices. Test a simple code change.

In fact, the original idea of ​​DevOps is very simple. It is a concept proposed to solve the gap between Dev and Ops and speed up the application development and launch process.

However, this is not realistic for most companies, and when all cloud native trends converge, DevOps becomes increasingly inefficient.

Some developers confided in the forum: "Bullshit DevOps, we developers don't want to do operation and maintenance at all!"; others shouted the slogan "DevOps is dead, platform engineering is the future."

Why is DevOps, originally used to improve R&D efficiency, increasingly ineffective? Is there a new tool that can solve this challenge?
Insert image description here

How to solve DevOps failure?

In the past few years, most enterprises have invested a lot of time and energy in migrating to and managing the cloud. With the development of the cloud, application architectures are also constantly cloudified. Distributed architectures, microservices, serviceless, and multi-cloud application architectures have become more popular. A single application system requires more and more types and quantities of cloud resources, and online and changes are more complex. frequently.

In order to meet the growing needs of enterprise cloudification, DevOps, a more agile and rapid development method, has entered everyone's field of vision. DevOps pursues a "who builds, who runs" approach, requiring developers to deploy and run applications end-to-end.

But is the development of DevOps really as smooth as expected?

A cruel reality is that although enterprises have formulated DevOps-related strategies, because the technical capabilities of the development team cannot meet the operation and maintenance requirements, communication work mostly relies on manual labor. Phone calls, WeChat, emails, and work orders have become the majority of Dev and Ops communication. The main way, human flesh execution and operation and maintenance has become the main means.

This method can barely maintain operation on physical machines and even in the virtualization period. But in the cloud-native era, the deployment architecture of each different application is different. The same application may have different development, testing, UAT, production, deployment architecture, and underlying resources.

Most developers are not familiar with and do not want to understand complex infrastructure technologies. The relevant responsibilities fall more on senior personnel in the R&D team, or they rely on frequent human information exchange between the R&D and operation and maintenance teams, which inevitably leads to a loss of efficiency. Problems such as low quality and difficulty in ensuring quality.

In order to solve the collaboration problem between Dev and Ops, many companies have begun to hire dedicated DevOps personnel.

Insert image description here

Liang Sheng, co-founder and CTO of Shuche Software Seal

Dr. Liang Sheng, co-founder & CTO of Shuche Software Seal, observed a phenomenon: when the cloud native technology represented by K8s first matured, there were usually 10 R&D personnel and one DevOps personnel; but as the cloud native technology is deeply applied, 3-5 R&D personnel equipped with 1 DevOps personnel are already too busy.

What’s even more frightening is that DevOps labor costs are rising, even though this position does not directly create economic value.

DevOps, which was originally intended to improve R&D efficiency, has now become a "holding back" mechanism, which makes companies feel helpless. However, this challenge also allowed Shuche Software Seal to see a new market opportunity.

In the "Top Ten Strategic Technology Trends for 2023" released by Gartner at the beginning of this year, "Platform Engineering" was prominently listed. Gartner predicts that by 2026, 80% of software engineering organizations will have platform teams, 75% of which will include developer self-service portals.

The core carrier of the so-called platform engineering is the self-service tool chain and workflow in the software development process. Whether it is infrastructure provisioning, pipelines, monitoring, container management, etc., a self-service platform puts all these complexities into a black box, thereby providing developers with all the necessary tools out of the box, thus reducing the amount of time developers have to worry about across the entire application. The burden of managing a complex network of tools and infrastructure throughout its life cycle.

In fact, platform engineering is also a DevOps method. Its cleverness lies in developing a shared platform for application management. It is like eating a "buffet". The Dev team can select dishes that suit them according to their own needs and eat them as soon as they come, which improves development. While improving efficiency, it can also avoid duplication of dishes to the greatest extent, which means reducing the workload of the Ops team.

At present, large domestic Internet companies such as Wanwu Xinsheng Group (Aihuishou), Didi Chuxing, bilibili, Xiaomi, Ant Group, etc. are developing their own internal IDP platforms based on platform engineering concepts to solve the problem of lower cost and lower cost for technical teams. Meet business needs with higher efficiency and support business operation and development needs.

It is precisely because of the insight into this demand that Shuche Software Seal took the lead in launching a new generation application platform based on the platform engineering concept in China - Walrus. Its core is application management, cost management, environment management, application deployment management, and application environment. manage.

Platform teams can automate infrastructure management on the Walrus platform and enable developers to obtain reliable tools and workflows from a unified management technology platform to improve development efficiency.

In Dr. Liang Sheng’s view, although the exploration of platform engineering is still in the early stages of the market, enterprise demand has actually emerged.

Starting from large Internet companies with leading technologies, platform engineering technology will gradually be transferred to more traditional companies and small and medium-sized Internet companies. This is the opportunity for Walrus to survive as an independent application platform.

"Like the cloud computing platform, the Walrus application platform was born to allow enterprises to focus more on their business innovation instead of wasting it on platform development," said Dr. Liang Sheng.

Walrus: Solve the "last mile" of DevOps implementation

At the "2023 Platform Engineering Technology Conference" held recently, the Walrus platform received great attention.
Insert image description here

According to Jiang Peng, Seal COO of Datache Software, Walrus provides flexible and powerful application and environment deployment management capabilities, which can shield the upper-level abstraction of infrastructure, allowing R&D teams to build, deploy and run applications by themselves without knowing the underlying technical details. program, reducing the cognitive load on developers.
Insert image description here
Jiang Peng, co-founder and COO of Shuche Software Seal

At the same time, the operation and maintenance/platform team manages multiple environments such as development, testing, and production in a fine-grained manner through features such as environment dependency graphs and multi-level variable configuration, which enhances the controllability and visibility of the infrastructure.

The advantages of Walrus are reflected in six aspects, specifically:

Call your team’s best practices with one click

The service templates in Walrus are designed according to the DRY (Don't Repeat Yourself) principle. Users can reuse them and gradually accumulate the best practices of the R&D and operation and maintenance teams during actual use.

Avoid "internal friction" repeated configuration

Walrus supports batch cloning of services and environments. Users can easily copy existing service configurations to single or multiple target environments, and support the parameter definition of clone services. They can quickly create a new environment based on the configuration and services of the existing environment, including application-related services and services in the environment. Infrastructure resources.

Support heterogeneous infrastructure

Including traditional deployment and cloud native deployment. Supports any Kubernetes cluster, public cloud or private cloud infrastructure to achieve multi-cloud and hybrid cloud application deployment and management under a unified framework.

Have rich Day2 operation and maintenance capabilities

Day2 comes from the concept of the software life cycle in the cloud era, and generally refers to the period of time between the launch of the application and the end of the cycle. Walrus provides application deployment, upgrade, destruction, debugging, log viewing, remote Shell connection and other functions.

Provide flexible integration capabilities

It can be directly connected to the enterprise's existing CI/CD pipeline, or it can be integrated into the internal developer platform as a functional module.

Integrated AI large model

Walrus integrates large language model AI and realizes the combination of AI technology and application management through the AI ​​Agent mode. Users can directly use natural language to generate service template codes, and correct and explain the generated codes, further simplifying the application deployment experience.

It is worth mentioning that in the AIGC era, Walrus was the first to integrate the function of AI large model integration, which fully reflects the forward-looking nature of its technology.

In Dr. Liang Sheng’s view, large-model AI is bringing new opportunities to platform engineering:

On the one hand, AI technology is used to reduce the workload of DevOps engineers. Currently, Walrus’s application of AI large models can automatically troubleshoot engineers to a large extent. It is expected to do better in the next 2-3 years. This is something that AIOps automated operation and maintenance has been trying to achieve in the past few years but has been unable to do so. OK

On the other hand, the popularity of large AI models has led more and more companies to start deploying related models. However, the training and deployment of large models require a large amount of computing and storage resources, and direct deployment will encounter various large-scale underlying resource management. challenge.

Therefore, how to better integrate K8s and AI large models, optimize the deployment and operating efficiency of AIGC, and make it easier for enterprises to deploy and apply AI large models, this is the value that Walrus can provide.

At the same time, Walrus not only provides tools for engineers, but also provides guardrails for AI. Due to problems such as inaccurate and unsafe content generated by large AI models, Walrus is like a guardrail to correct erroneous, unsafe and non-compliant content generated by AI, making the entire platform management more efficient and orderly.

Overall, Walrus reduces the complexity of using infrastructure for technical teams, provides R&D and operation and maintenance teams with an easy-to-use and consistent application management experience, and solves the "last mile" problem of DevOps implementation.

Open source Walrus to improve R&D efficiency in the AIGC era

What is gratifying is that Walrus is now officially open source. As one of the earliest open source application management platform projects in China, Walrus once again practices the essence of open source culture.

As free software activist Richard Stallman said: Open source is a development methodology, and free software is a social movement.

Today's open source not only means "open source code", but also represents an advanced collaboration method that allows more people to freely share and use code. This process also accelerates product feedback and innovation, improves software reliability, and promotes the widespread application of software.

The open source of Walrus hopes to help more enterprises and development teams improve DevOps efficiency. At the same time, through the feedback and co-construction of community users, Walrus will further enhance the product competitiveness of Walrus, expand its influence, and ultimately serve enterprises and users around the world. .

In fact, Walrus's determination to take the open source route is closely related to the company's own genes. The founding team members of Shuche Software Seal are all from the core team of Rancher. Rancher Labs, a world-famous container management platform company, was founded by Dr. Liang Sheng in September 2014 and serves as CEO.

Rancher has been an open source software since its birth. Driven by open source methodology, it has become a widely used Kubernetes management platform in the world. Rancher's success, on the one hand, allowed the founding team of Shuche Software Seal to see various challenges in Kubernetes management, thus discovering Walrus It provides an opportunity to solve DevOps implementation problems; on the other hand, it also accumulates a lot of open source experience for the entire team, laying the foundation for the success of Walrus' open source.

Although they are familiar with the open source model, the Shuche Software Seal team still maintains a humble attitude.

In Dr. Liang Sheng’s view, the current product form of platform engineering is still uncertain. Whether it is domestic and foreign public cloud, private cloud vendors, or start-up companies, everyone is exploring the appropriate path.

"In fact, cloud vendors have repeatedly been making related products in recent years, hoping to allow users to better apply the components on their cloud platforms, but this process is not easy. Including the R&D efficiency platforms launched by some start-up companies, Integrated development, operation and maintenance platforms are all similar products, and they all want to solve this problem," said Dr. Liang Sheng.

Compared with the R&D performance platforms launched on the market that pay more attention to the R&D process and experience, Walrus prefers the operation and maintenance process in its products, trying to unify the experience of the R&D and operation and maintenance teams.

Compared with cloud vendors, Walrus's neutrality is more in line with users' desire not to be tied to a single cloud vendor. These unique advantages will become an opportunity for Walrus to stand out.

Nowadays, with the popularity of large AI models, the business path of Walrus open source technology also has more room for imagination. At present, many users have begun to use Walrus to simplify the deployment application process through AI large model scenarios, and some large enterprises have also begun to participate in the experience and construction of Walrus.

It is reported that Walrus has only been open sourced for about two weeks, and there has not been much publicity. However, nearly a hundred users are already using Walrus, which is better than expected.

Regarding the future, Dr. Liang Sheng said that he will promote the development of the Walrus project around the following scenarios:

First, it meets the needs of various enterprises in complex scenarios, such as automated management of application environment life cycles, enhancement of traditional deployment models, application release workflow and approval, configuration drift detection, etc.;

The second is to enhance policy management and control capabilities, such as automatically intercepting or alerting risky deployments and configurations;

The third is to strengthen AI intelligent scenarios, including daily application management operations based on natural language, AI intelligent analysis and fault location through AI. Especially in the direction of AI large models, Shuche Software Seal will debug proprietary models based on existing large models so that the large models can better serve DevOps scenarios.

"In the AIGC era, we hope that through Walrus open source, we can help more enterprises and developers solve their efficiency and cost problems in DevOps; we also hope that with the support of users, Walrus can go from China to the world and become the most popular in this field. Popular open source projects," said Dr. Liang Sheng.

A fact that cannot be ignored is that in the current increasingly tense international landscape, if the open source project Walrus can be born in China and successfully go global and participate in international competition, it will not only bring commercial returns to Seal, but will also go out of China. Platform engineering technology leads the future.

Conclusion

Today, platform engineering is still in the early stages of development, and whether it will be widely accepted remains to be seen.

However, platform engineering is growing as an important trend in the IT technology industry and will become more widely adopted as more and more enterprises seek to improve the efficiency and effectiveness of their software development and delivery processes.

This process is inseparable from the continuous exploration of pioneers such as Seal, which continuously optimizes platform engineering and helps enterprises improve efficiency, reduce costs and improve agility.

[About Technology Cloud Report]

Experts focusing on original enterprise-level content - Technology Cloud Report. Founded in 2015, it is one of the top 10 media in the cutting-edge enterprise IT field. Authoritatively recognized by the Ministry of Industry and Information Technology, it is one of the official communication media for Trusted Cloud and Global Cloud Computing Conference. In-depth original reporting on cloud computing, big data, artificial intelligence, blockchain and other fields.

Guess you like

Origin blog.csdn.net/weixin_43634380/article/details/132809391