Container technology represented by Kubernetes has become a new interface for cloud computing

Source| Alibaba Cloud Native Official Account

Author | Zhimin, Zhiqing

On Double 11 in 2020, Alibaba's core system is fully cloud-native, holding the largest traffic peak in history, and conveying to the industry a signal that "cloud native is on a large scale". There are many Alibaba "cloud native firsts" here, and one of the most critical points is that 80% of the core business is deployed on Alibaba Cloud Container ACK, which can be expanded to more than one million containers in one hour.

It can be said that the container technology represented by Kubernetes is becoming a new interface for cloud computing. Containers provide application distribution and delivery standards, decoupling applications from the underlying operating environment. As a standard for resource scheduling and orchestration, Kubernetes shields the differences in the underlying architecture and helps applications run smoothly on different infrastructures. CNCF Kubernetes conformance certification further ensures the compatibility of different cloud vendors Kubernetes, which also makes more companies willing to use container technology to build application infrastructure in the cloud era.

The rise of a new interface for cloud-native containers

1.png

As the de facto standard for container orchestration, Kubernetes supports different types of computing, storage, and networking capabilities at the IaaS layer. Whether it is CPU, GPU, FPGA or professional ASIC chips, it can schedule and efficiently use heterogeneous computing resources. It perfectly supports various open source frameworks, languages ​​and various types of applications.

With the fact that Kubernetes has become a new operating system, cloud-native container-based technology has become a new interface for cloud computing.

1. Cloud native container interface features

The cloud native container interface has the following three typical characteristics:

  • Encapsulate the infrastructure downwards and shield the differences in the underlying architecture.

  • Expanding the new boundary of cloud computing, integrated management of cloud edge and end.

  • Support multiple workloads and distributed architecture upwards.

1) Encapsulate the infrastructure downwards to shield the underlying differences

  • A unified skill stack reduces labor costs: Kubernetes can be deployed and delivered in different scenarios such as IDC, cloud, and edge. Through the use of the DevOps culture and toolset advocated by cloud native, the technology iteration speed can be effectively improved, so overall labor costs can be reduced.

  • Unified technology stack improves resource utilization: A variety of computing loads are uniformly scheduled in a Kubernetes cluster, which can effectively improve resource utilization. Gartner predicts that "70% of AI tasks will run on containers and serverless in the next 3 years." AI model training and big data computing workloads require Kubernetes to provide lower scheduling delays, greater concurrent scheduling throughput, and higher The utilization rate of heterogeneous resources.

  • Accelerate the cloud nativeization of data services: Due to the huge flexibility and cost advantages of the separation of computing and storage, the cloud nativeization of data services has gradually become a trend. The flexibility of containers and serverless can simplify capacity planning for computing tasks. Combining distributed cache acceleration (such as Alluxio or Alibaba Cloud Jindofs) and scheduling optimization can also greatly improve the computing efficiency of data computing and AI tasks.

  • Security capabilities are further strengthened: With the development of the digital economy, enterprise data assets have become the new "oil", and large amounts of data need to be exchanged and processed in the cloud. How to ensure the security, privacy, and credibility of data has become the biggest challenge for enterprises to go to the cloud. We need to use technology to establish a foundation of digital trust, protect data, help companies create trustworthy business partnerships, and promote business growth. For example, based on encryption computing technologies such as Intel SGX, Alibaba Cloud provides a trusted execution environment for cloud customers. However, the threshold for the development and use of trusted applications is very high, requiring users to refactor existing applications and deal with a large number of underlying technical details, making this technology very difficult to implement.

2) Expanding the new boundary of cloud computing, integrated management of cloud edge and end

As the scenarios and demands of edge computing continue to increase, "cloud edge collaboration" and "edge cloud native" are gradually becoming new technical focuses. Kubernetes has powerful container orchestration and resource scheduling capabilities, which can meet the unique needs of low power consumption, heterogeneous resource adaptation, and cloud-side network collaboration in edge/IoT scenarios. In order to promote the coordinated development of the cross-field of cloud native and edge computing, Alibaba officially open-sourced the open source edge computing cloud native project OpenYurt in May 2020 to promote the implementation of the concept of “cloud edge integration” and complete the pairing by extending native Kubernetes. Support for the requirements of edge computing scenarios, its main features are:

  • "Zero" intrusive edge cloud native solution: Provides complete Kubernetes compatibility, supports all native workloads and expansion technologies (Operator/CNI/CSI, etc.); it can easily convert native Kubernetes clusters into OpenYurt clusters with one click.

  • Node autonomy: It has the autonomy and self-healing capabilities of edge nodes in the cloud edge weak network or disconnected network environment to ensure business continuity.

  • For the application delivery of massive edge nodes, it can provide efficient, safe and controllable application publishing and management methods.

In 2019, Alibaba Cloud released the edge container service ACK@Edge at KubeCon, and OpenYurt is its core framework. In just one year, ACK@Edge has been used in audio and video live broadcast, cloud gaming, industrial Internet, transportation and logistics, urban brain and other scenarios, and served Hema, Youku, Alibaba Video Cloud and many Internet and new retail companies. At the same time, OpenYurt, the open source version of ACK@Edge, has become a sandbox project of CNCF, promoting the Kubernetes upstream community to take into account the needs of edge computing. Developers are welcome to build together to welcome the new era of intelligent connectivity.

3) Support multiple workloads and distributed architecture upwards

In the tide of IT transformation, enterprises are increasingly demanding digitalization and intelligence. The most prominent need is how to quickly and accurately dig out new business opportunities and model innovations from massive business data in order to better cope with many Business challenges of change and uncertainty.

Kubernetes can support many open source mainstream frameworks to build microservices, databases, messaging middleware, big data, AI, blockchain and other types of applications. From stateless applications, to enterprise core applications, to digital intelligent applications, enterprises and developers can smoothly automatically deploy, expand, and manage containerized applications based on Kubernetes.

2. How Alibaba understands cloud native container interface

Alibaba regards cloud native as an important technology trend in the future. In order to speed up faster and better coordinate, it has formulated a clear economic cloud native technology route, and the group will promote cloud native as a whole.

Under the guidance of the cloud-native container interface, Alibaba Group took infrastructure, operation and maintenance and its surrounding systems as the entry point to set off a wave of comprehensive cloud-nativeness, successively transforming the system into new solutions that adapt to the cloud-native architecture, and promote the group The technical framework and tools used internally are replaced by cloud-acceptable standard products or cloud products; further transformation of operation and maintenance thinking and working methods, compatible with new operation and maintenance models. For example: DevOps needs to change the operation and maintenance thinking of the traditional virtual machine era. The components of the container runtime must be changed to support the new mode under the Kubernetes Pod. Various operation and maintenance components such as logs and monitoring in the container need to be changed, and the operation and maintenance mode also needs to be changed. The change.

In terms of computing, network, and storage, through the unified management of Kubernetes, users can make full use of Alibaba Cloud’s IaaS capabilities, allowing each business to have its own independent flexible network card and cloud disk, and businesses with different requirements for network and storage performance. It also has the ability to be deployed on the same host machine, and to ensure mutual isolation without interference. The traditional non-cloud physical machine model determines the type of service deployment, and the problem of insufficient flexibility caused by it has also been well solved. Therefore, users have greatly improved business stability while improving resource utilization and reducing costs.

At the node resource level, users can make full use of Kubernetes' base expansion capabilities to make node management cloud native; at the architecture level, node self-healing can be achieved through node lifecycle controllers, self-healing controllers, and component upgrade controllers. The complete closed loop of the node life cycle of, circulation, delivery, and environmental component changes allows the container layer to completely shield the perception of the underlying nodes, and completely changes the node's operation and maintenance management mode. Based on the powerful cloud-native node management model, Alibaba has integrated the group's previously fragmented node resources into one, truly realizing the formation of the resource pool from point to point, integrating the kernel, environmental components, model specifications, etc. into a unified standard. The unification of pools combined with unified scheduling forms a huge flexibility. This is also the " book with the same text, the same track, the same system, the same line, the same region" in the cloud native node management , which makes the node resources change from the princes pattern. It has become a unified cloud native resource pool.

Emerging ecology and businesses, based on the cloud-native soil provided by ACK (Alibaba Cloud Container Service), such as Service Mesh, Serverless, Faas, etc., have also landed in the group very quickly and are booming.

At the application PaaS layer, the cloud-native application delivery model has moved towards more thorough containerization, making full use of the automated scheduling capabilities of Kubernetes, and building a unified PaaS operation and maintenance capability within the group based on the standard definition of OAM Trait, based on the GitOps R&D model Make infrastructure and cloud resources coded and programmable.

Ali Group's evolution to cloud-native container interface

In order to support the huge and complex business of Alibaba Group, in ten years, many technical engineers have taken a deep and shallow container journey. So, how does the container interface evolve within Alibaba Group?

In the past ten years, Alibaba Group’s container technology has evolved from self-developed LXC (Linux Container) container T4, to rich containers, and then to Kubernetes cloud-native lightweight containers. Each transformation and upgrade is based on the business background of different periods, technological iterations and self-innovation made .

The first stage: LXC-based container T4 attempt

Constrained by the huge overhead of virtual machine KVM and the complexity of KVM orchestration and management, Alibaba Group initiated the customization of LXC and Linux Kernel in 2011, and launched the LXC-based T4 container internally. However, compared with the later Docker, T4 container has some technical deficiencies, such as the lack of image extraction and application description. For many years after the birth of T4, Alibaba continued to try to build a complex baseline definition on top of T4, but encountered problems repeatedly.

The second stage: AliDocker introduced the container mirroring mechanism to achieve large-scale distribution

In 2015, Alibaba introduced the Docker mirroring mechanism to integrate the functions of Docker and T4 to complement each other, that is, to make T4 have Docker mirroring capabilities, and at the same time, allow Docker to have the friendliness of T4 to the internal operation and maintenance system, and on this basis Form the internal product AliDocker.

In the process, Ali introduced a P2P image distribution mechanism. With the gradual upgrade of e-commerce core applications to AliDocker, the host's environmental isolation and portability shielded the differences in the underlying environment, and became cloudification/unified scheduling/mixing/storage Subsequent infrastructure changes such as computing separation laid the foundation, and the advantages of the mirroring mechanism were reflected. Among them, the hatched P2P image distribution is Dragonfly, which joined CNCF in October 2018.

The third stage: Pouch, a container with completely independent property rights, fully containerized within Ali

With the large-scale deployment of container technology, the advantages of AliDocker are reflected, and Ali's completely independent property rights Pouch can be developed and gradually replaced AliDocker. At the same time, the 100% Pouchization of Alibaba Group has been advancing rapidly. Before Double 11 in 2016, the entire network has been containerized.

Pouch means a magical nursery bag that provides considerate service for the applications inside. Because Pouch unifies the runtime of the group's online applications, application developers do not need to pay attention to changes in the underlying infrastructure. In the next few years, the underlying infrastructure has undergone various technological evolutions such as cloudification, hybridization, network VPCization, diskless storage, kernel upgrades, and scheduling system upgrades. However, the Pouch container runtime caused most of the underlying changes to The application has no perception, shielding the influence on the upper application. Pouch itself also switched its runtime from LXC to runC, and fed its core technology back to the open source community. At the same time, the group has gradually switched the past existing AliDocker instances to the open source Pouch implementation seamlessly.

The existence of the rich container mode in the process, on the one hand, allows users and applications to seamlessly and smoothly switch to containerization. On the other hand, the various operation and maintenance systems that applications rely on, such as operation and maintenance, monitoring, and log collection, can be followed based on the rich container mode. Containerization and smooth migration.

But rich containers also have more disadvantages. Since multiple processes can exist in a rich container, and application developers and operation and maintenance personnel are allowed to log in to the container, this violates the "single function" principle of the container and is not conducive to the technological evolution of immutable infrastructure. For example, in the serverless evolution process, the agent process that is scheduled to be inserted is actually independent of the application. Too many functions in a container are not conducive to the health check and flexibility of the container.

Containerization is the only way to cloud native. In this way, Alibaba Group quickly completed the containerization step, which greatly accelerated the further evolution of cloud native. After full containerization, the general trend of cloud native has been unstoppable. More and more new concepts and application architectures have grown up in the container ecology. The advantages of container and image-based application packaging, distribution, orchestration, and operation and maintenance have been increased. People see, accept and embrace, various operation and maintenance systems begin to adapt to cloud native architecture.

The fourth stage: the evolution of the scheduling system and ACK

As the container technology represented by Kubernetes has become a new interface for cloud computing, Alibaba's self-developed Sigma is also continuing to explore the implementation of Kubernetes, and with the help of the group's comprehensive cloud access, it finally realized a comprehensive migration from Sigma control to ACK .

In 2018, the group scheduling system began the gradual evolution from internally customized Sigma to ACK, and container lightweighting became an important evolution goal. Under the cloud-native wave, the operation and maintenance ecology within the group has also evolved rapidly. The solution of lightweight containers is to use Kubernetes Pod to split the container, strip out the independent operation and maintenance container, and transfer many operation and maintenance processes that are not related to the application to the operation and maintenance container one by one.

At the beginning of its birth, Sigma was committed to unifying the many fragmented online resource pools of Alibaba Group. On this basis, it continued to explore new resource mixing modes, including offline mixing department, off-line mixing department, job scheduling, CPUShare, VPA, etc. Numerous technologies. By improving the overall resource utilization rate of the Alibaba Group's data center, it brings huge cost savings. Based on the fully managed Sigma Master, a large public resource pool, and application quota services, it provides serverless resource delivery and the best user experience. Sigma scheduling has also accelerated the full containerization process from T4 to Pouch. Through the application of customized Dockerfile standardized containers and transparent infrastructure Sigma scheduling engine, business research and development no longer need to care about the underlying operation and maintenance, and the focus of work can be focused on the business itself. .

The upgrade from Sigma to ACK hopes that ACK’s leading cloud product capabilities can empower Alibaba Group, so that Sigma can accelerate the enjoyment of cloud computing capabilities, including unified management of heterogeneous resources and global security compliance. But in fact, the process of migrating ACK was not smooth:

First of all, around the core control link, Ali’s original scale and complex scenario capabilities, how the original huge inventory of containers are migrated to the new platform, and how the container interface is compatible with and affects the upgrade of the existing huge ecosystem will actually be Become a burden and disadvantage in evolution. Realizing the difficulty of changing engines during high-speed flight and solving the problem of inventory migration has resonance in the industry.

Secondly, many issues such as performance, multi-cluster operation and maintenance, security defense, stability, etc. are all challenges for the full migration of ACK. Focusing on performance, Alibaba made a lot of optimizations based on native Kubernetes and gave back to the community, such as Cache Index, Watch Bookmark, etc., and built a complete set of Kubernetes scale facilities, including security defense components, OpenKruise, multi-cluster component release capabilities, etc. Wait.

Focusing on the overall idea of ​​"economy dispatch = ACK + economy expansion", the accumulation of Alibaba Group's internal migration to ACK can be deposited in the cloud, enriching product capabilities and helping customers to form cloud competitiveness. So far, Alibaba Group, Alibaba Cloud, and the open source community have formed a very good technical synergy, self-research, commercial, open source, and the trinity of integration and complementation .

Self-research, commercial, open source, trinity integration and complementary

Technology and business are complementary, business provides scenarios for technology to promote technological progress; technological progress in turn drives better business development. The complex and rich scenes provide a natural and fertile soil to further promote the development of Ali's technology. The technology of Alibaba Group has been continuously advanced. In the past, Alibaba took the lead in applying various technologies such as middleware, containers, and scheduling, which have been very leading in the industry, and deposited capabilities in cloud products before delivering them to customers, helping enterprises to accelerate digital transformation and creating a wide range of leaders Influence.

But in the new cloud-native era, how to maintain this influence under the cloud-native standards, we see more challenges. The aforementioned brief history of the evolution of Ali's container interface records how first-line Ali engineers deal with these challenges. More abstractly, these benefit from the strategic decision of Alibaba's trinity of self-research, commercial use, and open source.

1. Challenges from Alibaba Cloud

Most of the users Alibaba Cloud faced in the past were universal users, and the demands of the Alibaba Group’s internal scenarios were to solve problems such as large-scale and ultra-high performance. Whether Alibaba Cloud products can well take care of and support is a very big challenge. . Further consideration, if we can well abstract the demands of mass users, Alibaba Group is another very good "testing ground" for Alibaba Cloud.

2. Challenges within the group

Small boats are easy to turn around, but large boats are not so flexible. In the past, the huge-scale scenario within the Alibaba Group, which was unique in the industry, is now the burden of moving towards cloud native. The root of the problem is how to enable the Alibaba Group's technology to quickly integrate and contribute to cloud native standards, rather than forming a technology island.

3. Challenges and opportunities on the open source side

Challenges and opportunities on the open source side: Alibaba Cloud has continued to invest in cloud-native open source project contributions. It has launched OpenKruise, jointly launched OAM, KubeVela and other open source projects with Microsoft. These are all derived from Alibaba’s precipitation in the cloud-native field and through open source Feedback from community users has improved the solutions natively implemented in Alibaba Cloud. Take OpenKruise as an example. This project is a Kubernetes-based general expansion engine for large-scale application scenarios created by Alibaba. Its open source makes it easy for every Kubernetes developer and Alibaba Cloud user to use Alibaba. Unified deployment and release capabilities of internal cloud native applications. When community users or external enterprises encounter the dilemma that the native Kubernetes workload is not satisfied, the enterprise does not need to repeat a set of similar "wheels", but can choose to use the mature capabilities of OpenKruise. Moreover, more than 95% of the code in OpenKruise and the open source community version used by Ali Group are exactly the same. We hope to work with every cloud native enthusiast involved in the construction of OpenKruise to create this more comprehensive and universal cloud native application load engine.

The evolution of cloud native operating systems

Today, at the interface layer of cloud native application architecture, Alibaba Group's technology system is fully oriented towards cloud native technology and cloud products.

2.png

Alibaba Cloud provides customers with a cloud-native operating system. First, the infrastructure layer is a powerful IaaS resource. The computing resources based on the third-generation dragon architecture can be expanded more flexibly and provide higher performance at a more optimized cost; cloud-native Distributed file system, born for container persistent data; cloud native network accelerates application delivery capabilities, provides application-based load balancing and container network infrastructure.

Secondly, at the container orchestration layer, Alibaba Cloud Container Service has been online since 2015, and has worked with thousands of enterprise customers to jointly practice a large number of production-level scenarios in various industries. More and more customers are building most or even all of their applications in a cloud-native way. With the in-depth development of business, in order to meet the strong needs of large and medium-sized enterprises for reliability and security, Alibaba Cloud has launched new products that can be used to pay SLA containers It serves the enterprise version of ACK Pro, and also supports the landing of many products within the Ali Group.

The ACK Pro version of Container Service supports the needs of financial, large-scale Internet, government and enterprise customers, and supports larger clusters, higher performance and more comprehensive security protection.

  • First of all, based on the Dragon architecture, optimized design of software and hardware integration, providing excellent performance:

    • The lossless Terway container network simplifies the data link and reduces the delay by 30% compared to the routing network.
    • Support the world's first persistent memory instance, compared to NVMe, I/O-intensive applications TPS increased by 100%.
  • Second, provide efficient scheduling for optimization of heterogeneous computing power and workload:

    • Intelligent CPU scheduling optimization, under the premise of ensuring SLA and density, Web application QPS increased by 30%.
    • Support GPU computing power sharing, AI model prediction cost savings of more than 50%.
  • Finally, provide comprehensive security protection for enterprises:
    • Support Alibaba Cloud security sandbox containers to meet the security and isolation requirements of enterprise customers for applications, and the performance is improved by 30% compared to open source.
    • It is the first batch to pass the advanced security certification of trusted cloud containers in China, and supports second-level blocking of runtime risks.

At the same time, Aliyun's fully managed managed service grid ASM is officially commercialized, which is the industry's first fully managed Istio compatible service grid ASM. ASM can achieve unified management of multiple heterogeneous application services, providing unified management of heterogeneous services such as virtual machines, containers, elastic container instances, and IDC applications on the cloud, providing full link observability and end-to-end security protection . Help you accelerate the modernization of enterprise applications and easily build a hybrid cloud IT architecture.

3.png

Alibaba Cloud Container Service has entered the Gartner "Public Cloud Container Service Competitive Landscape" report for two consecutive years; in Forrester's first enterprise-level public cloud container platform report, Alibaba Cloud Container Service ranked first in Strong Performer, China .

Outlook

The future of cloud computing is cloud native, and the new container interface is a key step in evolution. Downward, the high-density and high-frequency capability requirements brought by the new container interface will further mature the end-to-end optimization of cloud computing; upwardly, serverless based on the new container interface, a new generation of middleware, and a new generation of application PaaS In the ascendant.

Cloud native technology is becoming the shortest path to release cloud value. In the future, Alibaba will continue to invest in cloud native, and Alibaba's cloud native technology will not only be popularized internally on a large scale, but also serve the whole society through Alibaba Cloud.

Guess you like

Origin blog.51cto.com/13778063/2554891