Upgrade Huawei Cloud CCE cluster experience in 3 major aspects to facilitate efficient cluster operation and maintenance management

This article is shared from the Huawei Cloud Community " Huawei Cloud creates a CCE cluster upgrade experience from the heart to help efficient cluster operation and maintenance management ", author: Cloud Container Future.

Driven by the wave of the cloud-native era, Kubernetes is developing with each passing day. The updated cluster version can bring newer functions and help users create a more powerful cloud-native application environment. However, how to allow users to actively upgrade cluster versions has always been a difficult problem recognized by the industry.

"We want to use the new capabilities introduced by K8s, and we also want to keep the overall cluster up to date. But so many of our important applications run on containers, how can I ensure that my business will not be affected in any way during the cluster upgrade process? Once a problem occurs, Can it be repaired quickly?", "My cluster version is relatively old. I want to upgrade to the latest version. The upgrade process may be very long. I am worried that it may have an impact on the upper-layer business and the impact is uncontrollable." - This is a CCE cluster Several questions the upgrade team hears most often when communicating with users.

To this end, the CCE cluster upgrade team has conducted an in-depth analysis and summarized the pain points of cluster upgrade, mainly including the following three aspects:

  • In terms of business impact, replacement upgrades or migration upgrades in traditional upgrades will cause the reconstruction of business Pods, thus affecting the business.

  • In terms of upgrade stability and efficiency, the Kubernetes cluster system is complex and there are many factors that affect upgrade stability. When the cluster version span is large, multiple upgrade operations need to be performed, and the upgrade time takes a long time. Especially in large-scale cluster upgrade scenarios, user perception is more obvious.

  • In terms of interactive experience, users lack overall control over the upgrade process, especially since there are many steps in the upgrade process and the cost of user understanding is high.

Figure 1  Pain points of cluster upgrade

How to upgrade clusters losslessly, quickly and smoothly is a common problem in the industry. Based on the above-mentioned pain points, the CCE product team started from the aspects of " process business-free ", " stable and efficient upgrade ", and " silky interactive experience " to create a brand-new cluster upgrade experience.

Indifferent to process business

Traditional upgrade methods mainly include node replacement upgrade and cluster migration upgrade. Both methods will cause the reconstruction of business Pods, thereby affecting user services. Huawei Cloud is the first to launch the in-place upgrade capability. It only needs to update the CCE component version without any changes to the nodes. It has no impact on the Pod business running in the cluster, thus achieving a lossless upgrade. At the same time, the speed of in-place upgrade is significantly improved compared to traditional upgrade.

Figure 2  Comparison between traditional upgrade and in-place upgrade

At the same time, users do not need to pay attention to the dependencies between clusters and plug-in versions. One-click upgrade will automatically upgrade and adapt for you, saving you worry and effort. In addition, if an unexpected situation occurs during the upgrade process, users can quickly recover based on the backup, making it easier for users to control the cluster upgrade.

Stable and efficient upgrade

In terms of improving upgrade stability, based on Huawei Cloud’s experience in tens of thousands of upgrades, we provide users with a comprehensive range of pre-upgrade check items. The check items cover clusters, nodes, plug-ins and applications, key component status and configuration, and resource usage. In other aspects, it helps users avoid upgrade risks to a great extent and achieve stable upgrades. At the same time, backup is an important guarantee for business continuity. The common Etcd backup solution in the industry has the problem of being unable to back up cluster components and configurations. By using the hard disk snapshot backup solution, we not only provide users with complete cluster data backup capabilities, but also have an average backup speed. Improved nearly 10 times.

In terms of upgrade efficiency, on the one hand, the Kubernetes community is only compatible with adjacent minor versions. When the version span is large, multiple upgrades to the latest version are required. We provide users with cross-version upgrade capabilities, supporting upgrades across up to 4 major versions, such as v1.23 to v1.27, which effectively shortens the user upgrade path and saves upgrade costs; on the other hand, the upgrade time increases with the cluster size. Positive growth. On the premise of ensuring the safety of cluster upgrades, we support concurrent upgrades of up to 100 nodes, allowing users to complete cluster node upgrades in a shorter time and improve upgrade efficiency.

Figure 3 Simplified cluster upgrade path

Figure 4 Concurrent upgrade of cluster nodes

Silky interactive experience

In terms of upgrade guidance, we use the guidance page to provide users with clear and intuitive prompt messages for the clusters to be upgraded, so that users will not miss important upgrade notifications.

Figure 5  Cluster management page cluster upgrade notification

In order to reduce the cost of user understanding, we have designed a small upgrade animation to explain the concepts and principles of in-place upgrade to help users vividly and intuitively understand the cluster upgrade process and precautions.

Figure 6  Cluster upgrade animation

At the same time, we have launched an upgrade path recommendation function to automatically select the best upgrade path and display the feature updates and optimization enhancements brought by this upgrade based on the upgrade path.

Figure 7  Upgrade path

During the upgrade process, we use visual means to present the upgrade progress and abnormal situations in detail to users. The upgrade process is clear at a glance, allowing users to control the upgrade progress and reduce anxiety.

Figure 8  Visualization of upgrade progress

When upgrading check exceptions, we aggregate check item information based on different resources to help users quickly view abnormal items and provide repair suggestions to guide users to quickly handle the problem.

Figure 9  Upgrade exception diagnosis analysis

After the upgrade is completed, we will help users perform automatic post-upgrade verification to ensure the normal operation of the upgraded cluster, saving users time and energy.

Figure 10 Automatic health diagnosis

future vision

You are welcome to use the CCE cluster upgrade function. We will continue to optimize the aspects of "process business-free", "stable and efficient upgrade", and "silky interactive experience" to make the cluster upgrade process simpler, more efficient and more reliable. Looking forward to your valuable feedback.

For service experience please visit

  • https://www.huaweicloud.com/product/cce.html

Related Links

  • https://support.huaweicloud.com/bulletin-cce/cce_bulletin_0067.html

  • https://bbs.huaweicloud.com/blogs/413984

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

Alibaba Cloud suffered a serious failure and all products were affected (restored). Tumblr cooled down the Russian operating system Aurora OS 5.0. New UI unveiled Delphi 12 & C++ Builder 12, RAD Studio 12. Many Internet companies urgently recruit Hongmeng programmers. UNIX time is about to enter the 1.7 billion era (already entered). Meituan recruits troops and plans to develop the Hongmeng system App. Amazon develops a Linux-based operating system to get rid of Android's dependence on .NET 8 on Linux. The independent size is reduced by 50%. FFmpeg 6.1 "Heaviside" is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10142741