Tencent launches Crane, the first cloud-native cost optimization open source project in China

about Us

For more cases and knowledge about cloud native, you can pay attention to the public account of the same name [Tencent Cloud Native]~

Welfare:

① Reply to the [Manual] in the background of the official account, you can get the "Tencent Cloud Native Roadmap Manual" & "Tencent Cloud Native Best Practices"~

②The official account will reply to the [series] in the background, and you can get "15 series of 100+ super practical cloud native original dry goods collection", including Kubernetes cost reduction and efficiency enhancement, K8s performance optimization practices, best practices and other series.

③If you reply to the [White Paper] in the background of the official account, you can get the "Tencent Cloud Container Security White Paper" & "The Source of Cost Reduction - Cloud Native Cost Management White Paper v1.0"

④ Reply to [Introduction to the Speed ​​of Light] in the background of the official account, you can get a 50,000-word essence tutorial of Tencent Cloud experts, Prometheus and Grafana of the speed of light.

author

Wang Xiaowei, FinOps certified practitioner, Tencent Cloud Technology Product Manager, Crane Product Manager.

The current state of cloud resource management

Suppose you are an application developer, and writing business code is your main business. How much resources an application needs is often determined through stress testing, which leads to huge waste of resources during non-business peak hours. Coincidentally, both the community and companies are actively promoting cloud native, claiming that it can solve the problem of resource waste with its powerful scheduling and elasticity. You embrace cloud native with great interest, but in the end, you find that the resource allocation of cloud native business also requires the traditional and manual method of stress testing.

Another example is that you are a platform-side operation and maintenance personnel, and you are burdened with KPIs to improve the utilization of platform resources. There are many applications with regularly fluctuating loads running in the cluster. You are pleasantly surprised to find that Kubernetes provides automatic capacity expansion, and you really want to try it. But when HPA is actually used, there may be a delay of several minutes or even tens of minutes from when the load rises to trigger the threshold, when the elastic controller starts to expand, and when the application is started, and the application is overwhelmed before the elasticity takes effect. So you ditch automatic resiliency and go back to the old path of locking in excess resources.

Can R&D personnel be freed from the abyss of resource allocation, and can elastic capabilities be made efficient and practical? So you take your question to the community to find the answer. You find that serverless technology that completely separates application code and infrastructure seems to be an option, but as you get deeper into it, you find that serverless is just a concept, not a standard. Because the server is completely abandoned, the underlying self-controllability and performance optimization capabilities are completely Loss; another type is resource hosting cluster headed by Google Autopilot cluster, this type of cluster should meet your demands, but it is platform-bound and requires payment.

We decided to change the status quo. We have accumulated a lot of experience in cost optimization of Tencent's internal business, combined with resource forecasting, intelligent elasticity and full-structure co-location capability, without sacrificing stability, the cluster peak utilization rate has been increased to 50%. % above, the following figure is the effect of optimization. We look forward to working with the community to optimize the common problems of application resource allocation and elasticity, so we give you the possibility of not rebuilding the wheel and choose open source.

Figure 1 The optimization effect of Crane in large-scale scenarios

The Birth of Crane: The First Open Source Tool for Enterprise Cost Optimization

In order to promote cloud native users to achieve the ultimate cost reduction on the basis of ensuring business stability, Tencent launched Crane (Cloud Resource Analytics and Economics), the first cost optimization open source project based on cloud native technology in China. Crane follows FinOps standards and aims to provide cloud-native users with a one-stop solution for cloud cost optimization.

The main contributors to the current Crane project include industry experts from well-known companies such as Tencent, Xiaohongshu, Google, eBay, Microsoft, and Tesla. (Crane open source project address: https://github.com/gocrane/crane/)

FinOps Compliant Crane Cost Optimization Tool Capability Model

Crane is the systematic output of Tencent's internal cloud resource optimization process methods and tools. At the same time, the construction and planning of Crane's core capabilities are completely in line with the capability model proposed by the FinOps Foundation.

Figure 2 Crane capability model

Crane Architecture and Features

Figure 3 Crane Architecture

Crane is committed to recommending resources and intelligent elastic configuration, business personnel no longer need to worry about how many resources the business needs, how to configure automatic expansion and shrinkage, etc. Crane will give the optimal solution based on the time series change data of the business.

One-click deployment

Crane remains platform independent, and you can install Crane to any Kubernetes cluster through a Helm package, whether on or off the cloud, to enjoy one-stop resource optimization capabilities. Crane is less intrusive, and its core components include the centralized controller crane and the node agent crane agent. You can freely combine and install them, and choose which capabilities to enable through featureGate.

Easy-to-use visual console

In order to lower the threshold for use, Crane provides a built-in console, where users can view cost allocation and cost trends based on the console, and achieve cost optimization through mouse clicks. All capabilities provide grayscale control and preview mode, as well as the ability to roll back, to eliminate business-side concerns about resource changes.

Out-of-the-box inspection capabilities

Crane can globally scan the overall waste situation, visualize hidden waste, and save operation and maintenance personnel from repetitive tasks such as pulling monitoring data and writing query scripts.

The optimization plan includes the display of cost changes, the display of utilization changes, possible risk points, and even the ranking of optimization suggestions. Because we believe that each business is unique and has its most suitable optimization plan, and cannot be generalized.

Instant resiliency (EffectivePodAutoscaler(EPA))

Traditional event-based resiliency tools lead to a natural flaw - resiliency is triggered only when business metrics deviate from normal values. This lag makes cloud users afraid to use resiliency. EPA supports scalable forecasting algorithms to drive horizontal and vertical elasticity with forecast results, ensuring that businesses can spring up in advance, completely avoiding the embarrassment of native elasticity capabilities dying before they can be deployed. At the same time, Crane unifies the two resilience capabilities of the community's HPA and VPA, and proposes the concept of resilience EPA.

Figure 4 EPA ensures that workloads can be scaled ahead of time

The dual combination of stability and resource optimization

Crane's improvement in resource utilization is never at the expense of stability. Crane allows users to rate services, and the node agent is responsible for periodically checking node resource levels and system indicators, identifying application interference, and ensuring that sensitive business service levels are not damaged by scheduling prohibition, adjusting cgroups, and eviction.

Crane Present and Future

At present, Crane has released version 0.2.0, which has core capabilities such as resource recommendation, elastic recommendation, intelligent elasticity and stability enhancement. For more development plans, please refer to the milestones .

Further reading

FinOps (Financial Operations) defines a set of cloud financial management rules and best practices that enable organizations to maximize their benefits by enabling engineering and finance teams, technical and business teams to collaborate with each other to make data-driven cost decisions.

Adhering to the core values ​​of user-oriented, science and technology for good, Tencent Cloud shares the experience, methods and tools of internal cloud resource optimization with the community in the form of open source, and it is its mission and responsibility to help cloud users optimize cloud costs. In December 2021, Tencent became a top member of the FinOps Foundation, dedicated to the promotion and technical output of cloud resource optimization concepts.

join us

When the Crane project is open source, welcome to follow https://github.com/gocrane/crane/collection/Star Support.

We are calling Crane's first batch of open source technology fans in a limited number. As long as you are interested in Crane and related technologies, you are welcome to join. How to participate: Add Teng Xiaoyun WeChat (TKEplatform), Reply: Crane, Xiaoyun will pull you into the group .

[Tencent Cloud Native] New products of Yunshuo, new techniques of Yunyan, new activities of Yunyou, and information of cloud appreciation, scan the code to follow the public account of the same name, and get more dry goods in time! !

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324149628&siteId=291194637